Docs update tracker

Updates from: 10/10/2023 01:11:27

Service	Microsoft Docs article	Related commit history on GitHub	Change details
active-directory	Fido2 Compatibility	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/authentication/fido2-compatibility.md	Previously updated : 07/26/2023 Last updated : 10/09/2023 -+ The following tables show which transports are supported for each platform. Supp \|\|\|--\|--\| \| Edge \| ❌ \| ❌ \| ❌ \| \| Chrome \| ✅ \| ❌ \| ❌ \| -\| Firefox \| ✅ \| ❌ \| ❌ \| +\| Firefox \| ❌ \| ❌ \| ❌ \| ### iOS
active-directory	Product Permissions Analytics Reports	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/cloud-infrastructure-entitlement-management/product-permissions-analytics-reports.md	Previously updated : 09/11/2023 Last updated : 10/04/2023 You can view the Permissions Analytics Report information directly in the Permis 2. Locate the Permissions Analytics Report in the list, then select it. 3. Select which Authorization System you want to generate the PDF download for (AWS, Azure, or GCP). >[!NOTE] - > The PDF can only be downloaded for one Authorization System at a time. If more than one Authorization System is selected, the Export PDF button will be disabled. -4. To download the report in PDF format, click on Export PDF. + > You can download a PDF report for up to 10 authorization systems at one time. The authorization systems must be part of the same cloud environment (for example, 1- 10 authorization systems that are all on Amazon Web Service (AWS)). The following message displays: Successfully started to generate PDF report. - Once the PDF is generated, the report is automatically sent to your email. + Once the PDF is generated, the report(s) is automatically sent to your email. <!## Add and remove tags in the Permissions analytics report
active-directory	Msal Android Single Sign On	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/develop/msal-android-single-sign-on.md	The following browsers have been tested to see if they correctly redirect to the \| - \| :--: \| -: \| -: \| -: \| -: \| -: \| \| Nexus 4 (API 17) \| pass \| pass \| not applicable \| not applicable \| not applicable \| not applicable \| \| Samsung S7 (API 25) \| pass<sup>1</sup> \| pass \| pass \| pass \| fail \| pass \| -\| Huawei (API 26) \| pass<sup>2</sup> \| pass \| fail \| pass \| pass \| pass \| \| Vivo (API 26) \| pass \| pass \| pass \| pass \| pass \| fail \| \| Pixel 2 (API 26) \| pass \| pass \| pass \| pass \| fail \| pass \| -\| Oppo \| pass \| not applicable<sup>3</sup> \| not applicable \| not applicable \| not applicable \| not applicable \| +\| Oppo \| pass \| not applicable<sup>2</sup> \| not applicable \| not applicable \| not applicable \| not applicable \| \| OnePlus (API 25) \| pass \| pass \| pass \| pass \| fail \| pass \| \| Nexus (API 28) \| pass \| pass \| pass \| pass \| fail \| pass \| \| MI \| pass \| pass \| pass \| pass \| fail \| pass \| <sup>1</sup>Samsung's built-in browser is Samsung Internet.<br/> -<sup>2</sup>Huawei's built-in browser is Huawei Browser.<br/> -<sup>3</sup>The default browser can't be changed inside the Oppo device setting. +<sup>2</sup>The default browser can't be changed inside the Oppo device setting. ## Next steps
active-directory	Howto Manage Local Admin Passwords	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/devices/howto-manage-local-admin-passwords.md	This feature is now available in the following Azure clouds: - Azure Global - Azure Government-- Microsoft Azure operated by 21Vianetated by 21Vianet +- Microsoft Azure operated by 21Vianet ### Operating system updates
active-directory	Manage Device Identities	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/devices/manage-device-identities.md	To view or copy BitLocker keys, you need to be the owner of the device or have o - Security Administrator - Security Reader -## View and filter your devices (preview) +## View and filter your devices -In this preview, you have the ability to infinitely scroll, reorder columns, and select all devices. You can filter the device list by these device attributes: +You can filter the device list by these attributes: - Enabled state - Compliant state - Join type (Microsoft Entra joined, Microsoft Entra hybrid joined, Microsoft Entra registered) - Activity timestamp-- OS Type and Version +- OS type and OS version - Windows is displayed for Windows 11 and Windows 10 devices (with KB5006738). - Windows Server is displayed for [supported versions managed with Microsoft Defender for Endpoint](/mem/intune/protect/mde-security-integration#supported-platforms). - Device type (printer, secure VM, shared device, registered device) In this preview, you have the ability to infinitely scroll, reorder columns, and - Administrative unit - Owner -To enable the preview in the All devices view: - -1. Sign in to the [Microsoft Entra admin center](https://entra.microsoft.com) as at least a [Global Reader](../roles/permissions-reference.md#global-reader). -1. Browse to Identity > Devices > All devices. -1. Select the Preview features button. -1. Turn on the toggle that says Enhanced devices list experience. Select Apply. -1. Refresh your browser. - -You can now experience the enhanced All devices view. - ## Download devices Global readers, Cloud Device Administrators, Intune Administrators, and Global Administrators can use the Download devices option to export a CSV file that lists devices. You can apply filters to determine which devices to list. If you don't apply any filters, all devices are listed. An export task might run for as long as an hour, depending on your selections. If the export task exceeds 1 hour, it fails, and no file is output. The exported list includes these device identity attributes: `displayName,accountEnabled,operatingSystem,operatingSystemVersion,joinType (trustType),registeredOwners,userNames,mdmDisplayName,isCompliant,registrationTime,approximateLastSignInDateTime,deviceId,isManaged,objectId,profileType,systemLabels,model` +The following filters can be applied for the export task: + +- Enabled state +- Compliant state +- Join type +- Activity timestamp +- OS type +- Device type + ## Configure device settings If you want to manage device identities by using the Microsoft Entra admin center, the devices need to be either [registered or joined](overview.md) to Microsoft Entra ID. As an administrator, you can control the process of registering and joining devices by configuring the following device settings.
active-directory	External Collaboration Settings Configure	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/external-identities/external-collaboration-settings-configure.md	External collaboration settings can be configured by using the Microsoft Graph A With the Guest Inviter role, you can give individual users the ability to invite guests without assigning them a global administrator or other admin role. Assign the Guest inviter role to individuals. Then make sure you set Admins and users in the guest inviter role can invite to Yes. -Here's an example that shows how to use PowerShell to add a user to the Guest Inviter role: +Here's an example that shows how to use Microsoft Graph PowerShell to add a user to the `Guest Inviter` role: ++ +```powershell + +Import-Module Microsoft.Graph.Identity.DirectoryManagement + +$roleName = "Guest Inviter" +$role = Get-MgDirectoryRole \| where {$_.DisplayName -eq $roleName} +$userId = <User Id/User Principal Name> + +$DirObject = @{ + "@odata.id" = "https://graph.microsoft.com/v1.0/directoryObjects/$userId" + } + +New-MgDirectoryRoleMemberByRef -DirectoryRoleId $role.Id -BodyParameter $DirObject -``` -Add-MsolRoleMember -RoleObjectId 95e79109-95c0-4d8e-aee3-d01accf2d47b -RoleMemberEmailAddress <RoleMemberEmailAddress> ``` ## Sign-in logs for B2B users See the following articles on Microsoft Entra B2B collaboration: - [What is Microsoft Entra B2B collaboration?](what-is-b2b.md) - [Adding a B2B collaboration user to a role](./add-users-administrator.md)+
active-directory	Tenant Restrictions V2	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/external-identities/tenant-restrictions-v2.md	Tenant restrictions v2 policies can't be directly enforced on non-Windows 10, Wi ### Migrate tenant restrictions v1 policies to v2 +Migration of Tenant Restrictions from V1 to V2 is an one time operation. Once you have moved from TRv1 to TRv2 on proxy, no client side changes are required and any policy changes need to just happen on the cloud via Entra portal. + On your corporate proxy, you can move from tenant restrictions v1 to tenant restrictions v2 by changing this tenant restrictions v1 header: `Restrict-Access-To-Tenants: <allowed-tenant-list>`
active-directory	Whats New Archive	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/fundamentals/whats-new-archive.md	All Devices List: - Columns can be reordered via drag and drop - Select all devices -For more information, see: [Manage devices in Azure AD using the Azure portal](../devices/manage-device-identities.md#view-and-filter-your-devices-preview). +For more information, see: [Manage devices in Azure AD using the Azure portal](../devices/manage-device-identities.md#view-and-filter-your-devices). Smart Lockout now synchronizes the lockout state across Azure AD data centers, s - +
active-directory	Best Practices	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/roles/best-practices.md	Some roles include privileged permissions, such as the ability to update credent :::image type="content" source="./media/best-practices/privileged-role-assignments-warning.png" alt-text="Screenshot of the Microsoft Entra roles and administrators page that shows the privileged role assignments warning." lightbox="./media/best-practices/privileged-role-assignments-warning.png"::: - You can identity roles, permissions, and role assignments that are privileged by looking for the PRIVILEGED label. For more information, see [Privileged roles and permissions in Microsoft Entra ID](privileged-roles-permissions.md). + You can identify roles, permissions, and role assignments that are privileged by looking for the PRIVILEGED label. For more information, see [Privileged roles and permissions in Microsoft Entra ID](privileged-roles-permissions.md). <a name='7-use-groups-for-azure-ad-role-assignments-and-delegate-the-role-assignment'></a>
active-directory	Amazon Business Provisioning Tutorial	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/saas-apps/amazon-business-provisioning-tutorial.md	This section guides you through the steps to configure the Microsoft Entra provi ![Screenshot of Token.](media/amazon-business-provisioning-tutorial/test-connection.png) + For Tenant URL and Authorization endpoint values please use the table below + + \|Country\|Tenant URL\|Authorization endpoint + \|\|\|\| + \|Canada\|https://na.business-api.amazon.com/scim/v2/\|https://www.amazon.ca/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|Germany\|https://eu.business-api.amazon.com/scim/v2/\|https://www.amazon.de/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|Spain\|https://eu.business-api.amazon.com/scim/v2/\|https://www.amazon.es/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|France\|https://eu.business-api.amazon.com/scim/v2/\|https://www.amazon.fr/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|GB/UK\|https://eu.business-api.amazon.com/scim/v2/\|https://www.amazon.co.uk/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|India\|https://eu.business-api.amazon.com/scim/v2/\|https://www.amazon.in/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|Italy\|https://eu.business-api.amazon.com/scim/v2/\|https://www.amazon.it/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|Japan\|https://jp.business-api.amazon.com/scim/v2/\|https://www.amazon.co.jp/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|Mexico\|https://na.business-api.amazon.com/scim/v2/\|https://www.amazon.com.mx/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + \|US\|https://na.business-api.amazon.com/scim/v2/\|https://www.amazon.com/b2b/abws/oauth?state=1&redirect_uri=https://portal.azure.com/TokenAuthorize&applicationId=amzn1.sp.solution.ee27ec8c-1ee9-4c6b-9e68-26bdc37479d3\| + 1. In the Notification Email field, enter the email address of a person or group who should receive the provisioning error notifications and select the Send an email notification when a failure occurs check box. ![Screenshot of Notification Email.](common/provisioning-notification-email.png) This section guides you through the steps to configure the Microsoft Entra provi \|active\|Boolean\|\| \|emails[type eq "work"].value\|String\|\| \|name.givenName\|String\|\| - \|name.givenName\|String\|\| - \|externalId\|String\|\| + \|name.familyName\|String\|\| \|externalId\|String\|\| + \|roles\|List of appRoleAssignments [appRoleAssignments]\|\| 1. Under the Mappings section, select Synchronize Microsoft Entra groups to Amazon Business.
active-directory	Diffchecker Provisioning Tutorial	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/saas-apps/diffchecker-provisioning-tutorial.md	+ + Title: 'Tutorial: Configure Diffchecker for automatic user provisioning with Microsoft Entra ID' +description: Learn how to automatically provision and deprovision user accounts from Microsoft Entra ID to Diffchecker. ++ +writer: twimmers + +ms.assetid: fe6b1d92-06e7-4933-9ef0-7aecd6b9b495 ++++ Last updated : 10/09/2023+++ +# Tutorial: Configure Diffchecker for automatic user provisioning + +This tutorial describes the steps you need to perform in both Diffchecker and Microsoft Entra ID to configure automatic user provisioning. When configured, Microsoft Entra ID automatically provisions and deprovisions users to [Diffchecker](https://www.diffchecker.com) using the Microsoft Entra provisioning service. For important details on what this service does, how it works, and frequently asked questions, see [Automate user provisioning and deprovisioning to SaaS applications with Microsoft Entra ID](../app-provisioning/user-provisioning.md). + +## Supported capabilities +> [!div class="checklist"] +> * Create users in Diffchecker. +> * Remove users in Diffchecker when they do not require access anymore. +> * Keep user attributes synchronized between Microsoft Entra ID and Diffchecker. +> * [Single sign-on](diffchecker-tutorial.md) to Diffchecker (recommended). + +## Prerequisites + +The scenario outlined in this tutorial assumes that you already have the following prerequisites: + +* [A Microsoft Entra tenant](../develop/quickstart-create-new-tenant.md) +* A user account in Microsoft Entra ID with [permission](../roles/permissions-reference.md) to configure provisioning (for example, Application Administrator, Cloud Application administrator, Application Owner, or Global Administrator). +* A user account in Diffchecker with Admin permissions. + +## Step 1: Plan your provisioning deployment +* Learn about [how the provisioning service works](../app-provisioning/user-provisioning.md). +* Determine who will be in [scope for provisioning](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). +* Determine what data to [map between Microsoft Entra ID and Diffchecker](../app-provisioning/customize-application-attributes.md). + +## Step 2: Configure Diffchecker to support provisioning with Microsoft Entra ID +Contact Diffchecker support to configure Diffchecker to support provisioning with Microsoft Entra ID. + +## Step 3: Add Diffchecker from the Microsoft Entra application gallery + +Add Diffchecker from the Microsoft Entra application gallery to start managing provisioning to Diffchecker. If you have previously setup Diffchecker for SSO, you can use the same application. However it's recommended that you create a separate app when testing out the integration initially. Learn more about adding an application from the gallery [here](../manage-apps/add-application-portal.md). + +## Step 4: Define who will be in scope for provisioning + +The Microsoft Entra provisioning service allows you to scope who will be provisioned based on assignment to the application and or based on attributes of the user. If you choose to scope who will be provisioned to your app based on assignment, you can use the following [steps](../manage-apps/assign-user-or-group-access-portal.md) to assign users to the application. If you choose to scope who will be provisioned based solely on attributes of the user, you can use a scoping filter as described [here](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). + +* Start small. Test with a small set of users before rolling out to everyone. When scope for provisioning is set to assigned users, you can control this by assigning one or two users to the app. When scope is set to all users, you can specify an [attribute based scoping filter](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). + +* If you need more roles, you can [update the application manifest](../develop/howto-add-app-roles-in-azure-ad-apps.md) to add new roles. + +## Step 5: Configure automatic user provisioning to Diffchecker + +This section guides you through the steps to configure the Microsoft Entra provisioning service to create, update, and disable users in Diffchecker based on user assignments in Microsoft Entra ID. + +<a name='to-configure-automatic-user-provisioning-for-Diffchecker-in-azure-ad'></a> + +### To configure automatic user provisioning for Diffchecker in Microsoft Entra ID: + +1. Sign in to the [Microsoft Entra admin center](https://entra.microsoft.com) as at least a [Cloud Application Administrator](../roles/permissions-reference.md#cloud-application-administrator). +1. Browse to Identity > Applications > Enterprise applications + + ![Screenshot of Enterprise applications blade.](common/enterprise-applications.png) + +1. In the applications list, select Diffchecker. + + ![Screenshot of the Diffchecker link in the Applications list.](common/all-applications.png) + +1. Select the Provisioning tab. + + ![Screenshot of Provisioning tab.](common/provisioning.png) + +1. Set the Provisioning Mode to Automatic. + + ![Screenshot of Provisioning tab automatic.](common/provisioning-automatic.png) + +1. Under the Admin Credentials section, input your Diffchecker Tenant URL and Secret Token. Click Test Connection to ensure Microsoft Entra ID can connect to Diffchecker. If the connection fails, ensure your Diffchecker account has Admin permissions and try again. + + ![Screenshot of Token.](common/provisioning-testconnection-tenanturltoken.png) + +1. In the Notification Email field, enter the email address of a person who should receive the provisioning error notifications and select the Send an email notification when a failure occurs check box. + + ![Screenshot of Notification Email.](common/provisioning-notification-email.png) + +1. Select Save. + +1. Under the Mappings section, select Synchronize Microsoft Entra users to Diffchecker. + +1. Review the user attributes that are synchronized from Microsoft Entra ID to Diffchecker in the Attribute-Mapping section. The attributes selected as Matching properties are used to match the user accounts in Diffchecker for update operations. If you choose to change the [matching target attribute](../app-provisioning/customize-application-attributes.md), you need to ensure that the Diffchecker API supports filtering users based on that attribute. Select the Save button to commit any changes. + + \|Attribute\|Type\|Supported for filtering\|Required by Diffchecker\| + \|\|\|\|\| + \|userName\|String\|&check;\|&check; + \|active\|Boolean\|\| + \|emails[type eq "work"].value\|String\|\|&check; + \|name.givenName\|String\|\| + \|name.familyName\|String\|\| + +1. To configure scoping filters, refer to the following instructions provided in the [Scoping filter tutorial](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). + +1. To enable the Microsoft Entra provisioning service for Diffchecker, change the Provisioning Status to On in the Settings section. + + ![Screenshot of Provisioning Status Toggled On.](common/provisioning-toggle-on.png) + +1. Define the users that you would like to provision to Diffchecker by choosing the desired values in Scope in the Settings section. + + ![Screenshot of Provisioning Scope.](common/provisioning-scope.png) + +1. When you're ready to provision, click Save. + + ![Screenshot of Saving Provisioning Configuration.](common/provisioning-configuration-save.png) + +This operation starts the initial synchronization cycle of all users defined in Scope in the Settings section. The initial cycle takes longer to perform than subsequent cycles, which occur approximately every 40 minutes as long as the Microsoft Entra provisioning service is running. + +## Step 6: Monitor your deployment +Once you've configured provisioning, use the following resources to monitor your deployment: + +* Use the [provisioning logs](../reports-monitoring/concept-provisioning-logs.md) to determine which users have been provisioned successfully or unsuccessfully +* Check the [progress bar](../app-provisioning/application-provisioning-when-will-provisioning-finish-specific-user.md) to see the status of the provisioning cycle and how close it's to completion +* If the provisioning configuration seems to be in an unhealthy state, the application goes into quarantine. Learn more about quarantine states [here](../app-provisioning/application-provisioning-quarantine-status.md). + +## More resources + +* [Managing user account provisioning for Enterprise Apps](../app-provisioning/configure-automatic-user-provisioning-portal.md) +* [What is application access and single sign-on with Microsoft Entra ID?](../manage-apps/what-is-single-sign-on.md) + +## Next steps + +* [Learn how to review logs and get reports on provisioning activity](../app-provisioning/check-status-user-account-provisioning.md)
active-directory	Gong Provisioning Tutorial	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/saas-apps/gong-provisioning-tutorial.md	The scenario outlined in this tutorial assumes that you already have the followi 1. In the Update settings area, define how settings can be managed for this assignment: * Select Manual editing to manage data capture and permission settings for users in this assignment in Gong. After you create the assignment: if you make changes to group settings in Microsoft Entra ID, they will not be pushed to Gong. However, you can edit the group settings manually in Gong. - * (Recommended) Select Automatic updates to give Microsoft Entra ID Governance over data capture and permission settings in Gong. + * (Recommended) Select Automatic updates to give Microsoft Entra ID control over data capture and permission settings in Gong. Define data capture and permission settings in Gong only when creating an assignment. Thereafter, other changes will only be applied to users in groups with this assignment when pushed from Microsoft Entra ID. 1. Click ADD ASSIGNMENT. 1. For org's that don't have assignments (step 3), select the permission profile to apply to for automatically provisioned users.
active-directory	Team Today Provisioning Tutorial	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/active-directory/saas-apps/team-today-provisioning-tutorial.md	+ + Title: 'Tutorial: Configure Team Today for automatic user provisioning with Microsoft Entra ID' +description: Learn how to automatically provision and deprovision user accounts from Microsoft Entra ID to Team Today. ++ +writer: twimmers + +ms.assetid: 9d3f64dc-d18a-44e4-a13b-d5e37e2aac3a ++++ Last updated : 10/09/2023+++ +# Tutorial: Configure Team Today for automatic user provisioning + +This tutorial describes the steps you need to perform in both Team Today and Microsoft Entra ID to configure automatic user provisioning. When configured, Microsoft Entra ID automatically provisions and deprovisions users to [Team Today](https://team-today.com) using the Microsoft Entra provisioning service. For important details on what this service does, how it works, and frequently asked questions, see [Automate user provisioning and deprovisioning to SaaS applications with Microsoft Entra ID](../app-provisioning/user-provisioning.md). + +## Supported capabilities +> [!div class="checklist"] +> * Create users in Team Today. +> * Remove users in Team Today when they do not require access anymore. +> * Keep user attributes synchronized between Microsoft Entra ID and Team Today. +> * [Single sign-on](../manage-apps/add-application-portal-setup-oidc-sso.md) to Team Today (recommended). + +## Prerequisites + +The scenario outlined in this tutorial assumes that you already have the following prerequisites: + +* [A Microsoft Entra tenant](../develop/quickstart-create-new-tenant.md) +* A user account in Microsoft Entra ID with [permission](../roles/permissions-reference.md) to configure provisioning (for example, Application Administrator, Cloud Application administrator, Application Owner, or Global Administrator). +* A user account in Team Today with Admin permissions. + +## Step 1: Plan your provisioning deployment +* Learn about [how the provisioning service works](../app-provisioning/user-provisioning.md). +* Determine who will be in [scope for provisioning](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). +* Determine what data to [map between Microsoft Entra ID and Team Today](../app-provisioning/customize-application-attributes.md). + +## Step 2: Configure Team Today to support provisioning with Microsoft Entra ID +Contact Team Today support to configure Team Today to support provisioning with Microsoft Entra ID. + +## Step 3: Add Team Today from the Microsoft Entra application gallery + +Add Team Today from the Microsoft Entra application gallery to start managing provisioning to Team Today. If you have previously setup Team Today for SSO, you can use the same application. However it's recommended that you create a separate app when testing out the integration initially. Learn more about adding an application from the gallery [here](../manage-apps/add-application-portal.md). + +## Step 4: Define who will be in scope for provisioning + +The Microsoft Entra provisioning service allows you to scope who will be provisioned based on assignment to the application and or based on attributes of the user. If you choose to scope who will be provisioned to your app based on assignment, you can use the following [steps](../manage-apps/assign-user-or-group-access-portal.md) to assign users to the application. If you choose to scope who will be provisioned based solely on attributes of the user, you can use a scoping filter as described [here](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). + +* Start small. Test with a small set of users before rolling out to everyone. When scope for provisioning is set to assigned users, you can control this by assigning one or two users to the app. When scope is set to all users, you can specify an [attribute based scoping filter](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). + +* If you need more roles, you can [update the application manifest](../develop/howto-add-app-roles-in-azure-ad-apps.md) to add new roles. + +## Step 5: Configure automatic user provisioning to Team Today + +This section guides you through the steps to configure the Microsoft Entra provisioning service to create, update, and disable users in Team Today based on user assignments in Microsoft Entra ID. + +<a name='to-configure-automatic-user-provisioning-for-Team Today-in-azure-ad'></a> + +### To configure automatic user provisioning for Team Today in Microsoft Entra ID: + +1. Sign in to the [Microsoft Entra admin center](https://entra.microsoft.com) as at least a [Cloud Application Administrator](../roles/permissions-reference.md#cloud-application-administrator). +1. Browse to Identity > Applications > Enterprise applications + + ![Screenshot of Enterprise applications blade.](common/enterprise-applications.png) + +1. In the applications list, select Team Today. + + ![Screenshot of the Team Today link in the Applications list.](common/all-applications.png) + +1. Select the Provisioning tab. + + ![Screenshot of Provisioning tab.](common/provisioning.png) + +1. Set the Provisioning Mode to Automatic. + + ![Screenshot of Provisioning tab automatic.](common/provisioning-automatic.png) + +1. Under the Admin Credentials section, input your Team Today Tenant URL and Secret Token. Click Test Connection to ensure Microsoft Entra ID can connect to Team Today. If the connection fails, ensure your Team Today account has Admin permissions and try again. + + ![Screenshot of Token.](common/provisioning-testconnection-tenanturltoken.png) + +1. In the Notification Email field, enter the email address of a person who should receive the provisioning error notifications and select the Send an email notification when a failure occurs check box. + + ![Screenshot of Notification Email.](common/provisioning-notification-email.png) + +1. Select Save. + +1. Under the Mappings section, select Synchronize Microsoft Entra users to Team Today. + +1. Review the user attributes that are synchronized from Microsoft Entra ID to Team Today in the Attribute-Mapping section. The attributes selected as Matching properties are used to match the user accounts in Team Today for update operations. If you choose to change the [matching target attribute](../app-provisioning/customize-application-attributes.md), you need to ensure that the Team Today API supports filtering users based on that attribute. Select the Save button to commit any changes. + + \|Attribute\|Type\|Supported for filtering\|Required by Team Today\| + \|\|\|\|\| + \|userName\|String\|&check;\|&check; + \|externalId\|String\|&check;\|&check; + \|active\|Boolean\|\|&check; + \|name.givenName\|String\|\|&check; + \|name.familyName\|String\|\|&check; + \|urn:ietf:params:scim:schemas:extension:enterprise:2.0:User:department\|String\|\| + +1. To configure scoping filters, refer to the following instructions provided in the [Scoping filter tutorial](../app-provisioning/define-conditional-rules-for-provisioning-user-accounts.md). + +1. To enable the Microsoft Entra provisioning service for Team Today, change the Provisioning Status to On in the Settings section. + + ![Screenshot of Provisioning Status Toggled On.](common/provisioning-toggle-on.png) + +1. Define the users that you would like to provision to Team Today by choosing the desired values in Scope in the Settings section. + + ![Screenshot of Provisioning Scope.](common/provisioning-scope.png) + +1. When you're ready to provision, click Save. + + ![Screenshot of Saving Provisioning Configuration.](common/provisioning-configuration-save.png) + +This operation starts the initial synchronization cycle of all users defined in Scope in the Settings section. The initial cycle takes longer to perform than subsequent cycles, which occur approximately every 40 minutes as long as the Microsoft Entra provisioning service is running. + +## Step 6: Monitor your deployment +Once you've configured provisioning, use the following resources to monitor your deployment: + +* Use the [provisioning logs](../reports-monitoring/concept-provisioning-logs.md) to determine which users have been provisioned successfully or unsuccessfully +* Check the [progress bar](../app-provisioning/application-provisioning-when-will-provisioning-finish-specific-user.md) to see the status of the provisioning cycle and how close it's to completion +* If the provisioning configuration seems to be in an unhealthy state, the application goes into quarantine. Learn more about quarantine states [here](../app-provisioning/application-provisioning-quarantine-status.md). + +## More resources + +* [Managing user account provisioning for Enterprise Apps](../app-provisioning/configure-automatic-user-provisioning-portal.md) +* [What is application access and single sign-on with Microsoft Entra ID?](../manage-apps/what-is-single-sign-on.md) + +## Next steps + +* [Learn how to review logs and get reports on provisioning activity](../app-provisioning/check-status-user-account-provisioning.md)
ai-services	Authentication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/authentication.md	description: "There are three ways to authenticate a request to an Azure AI serv -+ Last updated 08/30/2023
ai-services	Autoscale	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/autoscale.md	Title: Use the autoscale feature description: Learn how to use the autoscale feature for Azure AI services to dynamically adjust the rate limit of your service. -+ Last updated 06/27/2022
ai-services	Cognitive Services And Machine Learning	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-and-machine-learning.md	-+ Last updated 10/28/2021
ai-services	Cognitive Services Container Support	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-container-support.md	-+ Last updated 08/28/2023
ai-services	Cognitive Services Custom Subdomains	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-custom-subdomains.md	description: Custom subdomain names for each Azure AI services resource are crea -+ Last updated 12/04/2020
ai-services	Cognitive Services Data Loss Prevention	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-data-loss-prevention.md	Title: Data loss prevention description: Azure AI services data loss prevention capabilities allow customers to configure the list of outbound URLs their Azure AI services resources are allowed to access. This configuration creates another level of control for customers to prevent data loss. -+ Last updated 03/31/2023
ai-services	Cognitive Services Development Options	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-development-options.md	-+ Last updated 10/28/2021
ai-services	Cognitive Services Environment Variables	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-environment-variables.md	description: "This guide shows you how to set and retrieve environment variables -+ Last updated 09/09/2022
ai-services	Cognitive Services Limited Access	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-limited-access.md	description: Azure AI services that are available with Limited Access are descri -+ Last updated 10/27/2022
ai-services	Cognitive Services Support Options	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-support-options.md	description: How to obtain help and support for questions and problems when you -+ Last updated 06/28/2022
ai-services	Cognitive Services Virtual Networks	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/cognitive-services-virtual-networks.md	description: Configure layered network security for your Azure AI services resou -+ Last updated 08/10/2023
ai-services	Commitment Tier	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/commitment-tier.md	description: Learn how to sign up for commitment tier pricing, which is differen -+ Last updated 12/01/2022
ai-services	Create Account Bicep	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/create-account-bicep.md	description: Create an Azure AI service resource with Bicep. keywords: Azure AI services, cognitive solutions, cognitive intelligence, cognitive artificial intelligence -+ Last updated 01/19/2023
ai-services	Create Account Resource Manager Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/create-account-resource-manager-template.md	keywords: Azure AI services, cognitive solutions, cognitive intelligence, cognit -+ Last updated 09/01/2022
ai-services	Create Account Terraform	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/create-account-terraform.md	Title: 'Quickstart: Create an Azure AI services resource using Terraform' description: 'In this article, you create an Azure AI services resource using Terraform' keywords: Azure AI services, cognitive solutions, cognitive intelligence, cognitive artificial intelligence -+ Last updated 4/14/2023
ai-services	Diagnostic Logging	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/diagnostic-logging.md	description: This guide provides step-by-step instructions to enable diagnostic -+ Last updated 07/19/2021
ai-services	Disable Local Auth	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/disable-local-auth.md	description: "This article describes disabling local authentication in Azure AI -+ Last updated 09/22/2023
ai-services	Language Support	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/language-support.md	description: Azure AI services enable you to build applications that see, hear, -+ Last updated 07/18/2023
ai-services	Multi Service Resource	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/multi-service-resource.md	keywords: Azure AI services, cognitive -+ Last updated 08/02/2023
ai-services	Managed Identity	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/openai/how-to/managed-identity.md	In the following sections, you'll use the Azure CLI to assign roles, and obtain - An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a> - Access granted to the Azure OpenAI Service in the desired Azure subscription-- Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at [https://aka.ms/oai/access</a>. Open an issue on this repo to contact us if you have an issue. +- Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the [Request Access to Azure OpenAI Service form](https://aka.ms/oai/access). Open an issue on this repo to contact us if you have an issue. - [Custom subdomain names are required to enable features like Azure Active Directory (Azure AD) for authentication.]( ../../cognitive-services-custom-subdomains.md)
ai-services	Role Based Access Control	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/openai/how-to/role-based-access-control.md	recommendations: false # Role-based access control for Azure OpenAI Service -Azure OpenAI Service supports Azure role-based access control (Azure RBAC), an authorization system for managing individual access to Azure resources. Using Azure RBAC, you assign different team members different levels of permissions based on their needs for a given project. For more information, see the [Azure RBAC documentation](../../../role-based-access-control/index.yml) for more information. +Azure OpenAI Service supports Azure role-based access control (Azure RBAC), an authorization system for managing individual access to Azure resources. Using Azure RBAC, you assign different team members different levels of permissions based on their needs for a given project. For more information, see the [Azure RBAC documentation](../../../role-based-access-control/index.yml). ## Add role assignment to an Azure OpenAI resource
ai-services	Plan Manage Costs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/plan-manage-costs.md	description: Learn how to plan for and manage costs for Azure AI services by usi -+ Last updated 11/03/2021
ai-services	Policy Reference	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/policy-reference.md	description: Lists Azure Policy built-in policy definitions for Azure AI service Last updated 09/19/2023 -+
ai-services	Recover Purge Resources	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/recover-purge-resources.md	description: This article provides instructions on how to recover or purge an already-deleted Azure AI services resource. -+ Last updated 10/5/2023
ai-services	Responsible Use Of Ai Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/responsible-use-of-ai-overview.md	description: Azure AI services provides information and guidelines on how to res -+ Last updated 1/10/2022
ai-services	Rotate Keys	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/rotate-keys.md	description: "Learn how to rotate API keys for better security, without interrup -+ Last updated 11/08/2022
ai-services	Security Controls Policy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/security-controls-policy.md	Last updated 09/19/2023 -+ # Azure Policy Regulatory Compliance controls for Azure AI services
ai-services	Security Features	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/security-features.md	description: Learn about the security considerations for Azure AI services usage -+ Last updated 12/02/2022
ai-services	Create Translator Resource	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/translator/create-translator-resource.md	Title: Create a Translator resource -description: This article shows you how to create an Azure AI Translator resource. +description: Learn how to create an Azure AI Translator resource and retrieve your API key and endpoint URL in the Azure portal. Last updated 09/06/2023 # Create a Translator resource -In this article, you learn how to create a Translator resource in the Azure portal. [Azure AI Translator](translator-overview.md) is a cloud-based machine translation service that is part of the [Azure AI services](../what-are-ai-services.md) family. Azure resources are instances of services that you create. All API requests to Azure AI services require an endpoint URL and a read-only key for authenticating access. +In this article, you learn how to create a Translator resource in the Azure portal. [Azure AI Translator](translator-overview.md) is a cloud-based machine translation service that is part of the [Azure AI services](../what-are-ai-services.md) family. Azure resources are instances of services that you create. All API requests to Azure AI services require an endpoint URL and a read-only key for authenticating access. ## Prerequisites -To get started, you need an active [Azure account](https://azure.microsoft.com/free/cognitive-services/). If you don't have one, you can [create a free 12-month subscription](https://azure.microsoft.com/free/). +To get started, you need an active [Azure account](https://azure.microsoft.com/free/cognitive-services/). If you don't have one, you can [create a free 12-month subscription](https://azure.microsoft.com/free/). ## Create your resource -The Translator service can be accessed through two different resource types: +With your Azure account, you can access the Translator service through two different resource types: * [Single-service](https://portal.azure.com/#create/Microsoft.CognitiveServicesTextTranslation) resource types enable access to a single service API key and endpoint. -* [Multi-service](https://portal.azure.com/#create/Microsoft.CognitiveServicesAllInOne) resource types enable access to multiple Azure AI services using a single API key and endpoint. +* [Multi-service](https://portal.azure.com/#create/Microsoft.CognitiveServicesAllInOne) resource types enable access to multiple Azure AI services by using a single API key and endpoint. ## Complete your project and instance details +After you decide which resource type you want use to access the Translator service, you can enter the details for your project and instance. + 1. Subscription. Select one of your available Azure subscriptions. 1. Resource Group. You can create a new resource group or add your resource to a pre-existing resource group that shares the same lifecycle, permissions, and policies. 1. Resource Region. Choose Global unless your business or application requires a specific region. If you're planning on using the Document Translation feature with [managed identity authorization](document-translation/how-to-guides/create-use-managed-identities.md), choose a geographic region such as East US. -1. Name. Enter the name you have chosen for your resource. The name you choose must be unique within Azure. +1. Name. Enter a name for your resource. The name you choose must be unique within Azure. > [!NOTE] - > If you are using a Translator feature that requires a custom domain endpoint, such as Document Translation, the value that you enter in the Name field will be the custom domain name parameter for the endpoint. + > If you're using a Translator feature that requires a custom domain endpoint, such as Document Translation, the value that you enter in the Name field will be the custom domain name parameter for the endpoint. 1. Pricing tier. Select a [pricing tier](https://azure.microsoft.com/pricing/details/cognitive-services/translator) that meets your needs: * Each subscription has a free tier. * The free tier has the same features and functionality as the paid plans and doesn't expire. - * Only one free tier is available per subscription. - * Document Translation is supported in paid tiers. The Language Studio only supports the S1 or D3 instance tiers. We suggest you select the Standard S1 instance tier to try Document Translation. + * Only one free tier resource is available per subscription. + * Document Translation is supported in paid tiers. The Language Studio only supports the S1 or D3 instance tiers. If you just want to try Document Translation, select the Standard S1 instance tier. -1. If you've created a multi-service resource, you need to confirm more usage details via the check boxes. +1. If you've created a multi-service resource, the links at the bottom of the Basics tab provides technical documentation regarding the appropriate operation of the service. 1. Select Review + Create. -1. Review the service terms and select Create to deploy your resource. +1. Review the service terms, and select Create to deploy your resource. 1. After your resource has successfully deployed, select Go to resource. All Azure AI services API requests require an endpoint URL and a read-only key f * Authentication keys. Your key is a unique string that is passed on every request to the Translation service. You can pass your key through a query-string parameter or by specifying it in the HTTP request header. -* Endpoint URL. Use the Global endpoint in your API request unless you need a specific Azure region or custom endpoint. See [Base URLs](reference/v3-0-reference.md#base-urls). The Global endpoint URL is `api.cognitive.microsofttranslator.com`. +* Endpoint URL. Use the Global endpoint in your API request unless you need a specific Azure region or custom endpoint. For more information, see [Base URLs](reference/v3-0-reference.md#base-urls). The Global endpoint URL is `api.cognitive.microsofttranslator.com`. ## Get your authentication keys and endpoint -1. After your new resource deploys, select Go to resource or navigate directly to your resource page. -1. In the left rail, under Resource Management, select Keys and Endpoint. -1. Copy and paste your keys and endpoint URL in a convenient location, such as Microsoft Notepad. +To authenitcate your connection to your Translator resource, you'll need to find its keys and endpoint. + +1. After your new resource deploys, select Go to resource or go to your resource page. +1. In the left navigation pane, under Resource Management, select Keys and Endpoint. +1. Copy and paste your keys and endpoint URL in a convenient location, such as Notepad. ## Create a Text Translation client Text Translation supports both [global and regional endpoints](#complete-your-pr > > Deleting a resource group also deletes all resources contained in the group. -To remove an Azure AI multi-service or Translator resource, you can delete the resource or delete the resource group. - To delete the resource: -1. Navigate to your Resource Group in the Azure portal. +1. Search and select Resource groups in the Azure portal, and select your resource group. 1. Select the resources to be deleted by selecting the adjacent check box. 1. Select Delete from the top menu near the right edge. -1. Type yes in the Deleted Resources dialog box. +1. Enter delete in the Delete Resources dialog box. 1. Select Delete. To delete the resource group: -1. Navigate to your Resource Group in the Azure portal. -1. Select the Delete resource group from the top menu bar near the left edge. +1. Go to your Resource Group in the Azure portal. +1. Select Delete resource group from the top menu bar. 1. Confirm the deletion request by entering the resource group name and selecting Delete. -## How to get started with Translator +## How to get started with Azure AI Translator REST APIs In our quickstart, you learn how to use the Translator service with REST APIs. > [!div class="nextstepaction"] > [Get Started with Translator](quickstart-text-rest-api.md) -## More resources +## Next Steps -* [Microsoft Translator code samples](https://github.com/MicrosoftTranslator). Multi-language Translator code samples are available on GitHub. +* [Microsoft Translator code samples](https://github.com/MicrosoftTranslator). Multi-language Translator code samples are available on GitHub. * [Microsoft Translator Support Forum](https://www.aka.ms/TranslatorForum) * [Get Started with Azure (3-minute video)](https://azure.microsoft.com/get-started/?b=16.24)
ai-services	Use Key Vault	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/use-key-vault.md	Title: Develop Azure AI services applications with Key Vault description: Learn how to develop Azure AI services applications securely by using Key Vault. -+ Last updated 09/13/2022
ai-services	What Are Ai Services	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/what-are-ai-services.md	keywords: Azure AI services, cognitive-+ Last updated 7/18/2023
aks	Auto Upgrade Node Image	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/aks/auto-upgrade-node-image.md	Last updated 02/03/2023 -# Auto-upgrade Azure Kubernetes Service cluster node OS images +# Auto-upgrade Azure Kubernetes Service cluster node OS images + AKS now supports the node OS auto-upgrade channel, an exclusive channel dedicated to controlling node-level OS security updates. This channel can't be used for cluster-level Kubernetes version upgrades. ## How does node OS auto-upgrade work with cluster auto-upgrade? The default cadence means there's no planned maintenance window applied. \|Channel\|Updates Ownership\|Default cadence\| \|\|\| -\| `Unmanaged`\|OS driven security updates. AKS has no control over these updates\|Nightly around 6AM UTC for Ubuntu and Mariner, Windows every month.\| -\| `SecurityPatch`\|AKS\|Weekly\| -\| `NodeImage`\|AKS\|Weekly\| +\| `Unmanaged`\|OS driven security updates. AKS has no control over these updates.\|Nightly around 6AM UTC for Ubuntu and Azure Linux. Monthly for Windows.\| +\| `SecurityPatch`\|AKS\|Weekly.\| +\| `NodeImage`\|AKS\|Weekly.\| ## Prerequisites
aks	Windows Faq	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/aks/windows-faq.md	This article outlines some of the frequently asked questions and OS concepts for AKS uses Windows Server 2019 and Windows Server 2022 as the host OS version and only supports process isolation. Container images built by using other Windows Server versions are not supported. For more information, see [Windows container version compatibility][windows-container-compat]. For Kubernetes version 1.25 and higher, Windows Server 2022 is the default operating system. Windows Server 2019 is being retired after Kubernetes version 1.32 reaches end of life (EOL) and won't be supported in future releases. For more information about this retirement, see the [AKS release notes][aks-release-notes]. -## Is Kubernetes different on Windows and Linux? - -Windows Server node pool support includes some limitations that are part of the upstream Windows Server in Kubernetes project. These limitations are not specific to AKS. For more information on the upstream support from the Kubernetes project, see the [Supported functionality and limitations][upstream-limitations] section of the [Intro to Windows support in Kubernetes][intro-windows] document. - -Historically, Kubernetes is Linux-focused. Many examples used in the upstream [Kubernetes.io][kubernetes] website are intended for use on Linux nodes. When you create deployments that use Windows Server containers, the following considerations at the OS level apply: --- Identity: Linux identifies a user by an integer user identifier (UID). A user also has an alphanumeric user name for logging on, which Linux translates to the user's UID. Similarly, Linux identifies a user group by an integer group identifier (GID) and translates a group name to its corresponding GID. - Windows Server uses a larger binary security identifier (SID) that's stored in the Windows Security Access Manager (SAM) database. This database is not shared between the host and containers, or between containers. -- File permissions: Windows Server uses an access control list based on SIDs, rather than a bitmask of permissions and UID+GID.-- File paths: The convention on Windows Server is to use \ instead of /. - In pod specs that mount volumes, specify the path correctly for Windows Server containers. For example, rather than a mount point of /mnt/volume in a Linux container, specify a drive letter and location such as /K/Volume to mount as the K: drive. - ## What kind of disks are supported for Windows? Azure Disks and Azure Files are the supported volume types, and are accessed as NTFS volumes in the Windows Server container.
aks	Windows Vs Linux Containers	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/aks/windows-vs-linux-containers.md	+ + Title: Windows container considerations in Kubernetes + +description: See the Windows container considerations in Kubernetes. + Last updated : 10/05/2023+++++ +# Windows container considerations in Kubernetes + +When you create deployments that use Windows Server containers on Azure Kubernetes Service (AKS), there are a few differences relative to Linux deployments you should keep in mind. For a detailed comparison of the differences between Windows and Linux in upstream Kubernetes, please see [Windows containers in Kubernetes](https://kubernetes.io/docs/concepts/windows/intro/). + +Some of the major differences include: + +- Identity: Windows Server uses a larger binary security identifier (SID) that's stored in the Windows Security Access Manager (SAM) database. This database isn't shared between the host and containers or between containers. +- File permissions: Windows Server uses an access control list based on SIDs rather than a bitmask of permissions and UID+GID. +- File paths: The convention on Windows Server is to use \ instead of /. In pod specs that mount volumes, specify the path correctly for Windows Server containers. For example, rather than a mount point of /mnt/volume in a Linux container, specify a drive letter and location such as /K/Volume to mount as the K: drive. + +> [!NOTE] +> For Kubernetes versions 1.25 and higher, Windows Server 2022 is the default OS. Windows Server 2019 is being retired after Kubernetes version 1.32 reaches end-of-life (EOL) and won't be supported in future releases. For more information, see the [AKS release notes][aks-release-notes]. + +This article covers important considerations to keep in mind when using Windows containers instead of Linux containers in Kubernetes. For an in-depth comparison of Windows and Linux containers, see [Comparison with Linux][comparison-with-linux]. + +## Considerations + +\| Feature \| Windows considerations \| +\|--\|:--\| +\| [Cluster creation][cluster-configuration] \| ΓÇó The first system node pool must be Linux.<br/> ΓÇó AKS Windows clusters have a maximum limit of 10 node pools.<br/> ΓÇó AKS Windows clusters have a maximum limit of 100 nodes in each node pool.<br/> ΓÇó The Windows Server node pool name has a limit of six characters. \| +\| [Privileged containers][privileged-containers] \| Not supported. The equivalent is HostProcess Containers (HPC) containers. \| +\| [HPC containers][hpc-containers] \| ΓÇó HostProcess containers are the Windows alternative to Linux privileged containers. For more information, see [Create a Windows HostProcess pod](https://kubernetes.io/docs/tasks/configure-pod-container/create-hostprocess-pod/). \| +\| [Azure Network Policy Manager (Azure)][azure-network-policy] \| Azure Network Policy Manager doesn't support:<br/> ΓÇó Named ports<br/> ΓÇó SCTP protocol<br/> ΓÇó Negative match labels or namespace selectors (all labels except "debug=true")<br/> ΓÇó "except" CIDR blocks (a CIDR with exceptions)<br/> ΓÇó Windows Server 2019<br/> \| +\| [Node upgrade][node-upgrade] \| Windows Server nodes on AKS don't automatically apply Windows updates. Instead, you perform a node pool upgrade or [node image upgrade][node-image-upgrade]. These upgrades deploy new nodes with the latest Window Server 2019 and Windows Server 2022 base node image and security patches. \| +\| [AKS Image Cleaner][aks-image-cleaner] \| Not supported. \| +\| [BYOCNI][byo-cni] \| Not supported. \| +\| [Open Service Mesh][open-service-mesh] \| Not supported. \| +\| [GPU][gpu] \| Not supported. \| +\| [Multi-instance GPU][multi-instance-gpu] \| Not supported. \| +\| [Generation 2 VMs (preview)][gen-2-vms] \| Supported in preview. \| +\| [Custom node config][custom-node-config] \| ΓÇó Custom node config has two configurations:<br/> ΓÇó [kubelet][custom-kubelet-parameters]: Supported in preview.<br/> ΓÇó OS config: Not supported. \| + +## Next steps + +For more information on Windows containers, see the [Windows Server containers FAQ][windows-server-containers-faq]. + +<!-- LINKS - external --> +[aks-release-notes]: https://github.com/Azure/AKS/releases +[comparison-with-linux]: https://kubernetes.io/docs/concepts/windows/intro/#compatibility-linux-similarities + +<!-- LINKS - internal --> +[cluster-configuration]: ../aks/learn/quick-windows-container-deploy-cli.md#limitations +[privileged-containers]: use-windows-hpc.md#limitations +[hpc-containers]: use-windows-hpc.md#limitations +[node-upgrade]: ./manage-node-pools.md#upgrade-a-single-node-pool +[aks-image-cleaner]: image-cleaner.md#limitations +[windows-server-containers-faq]: windows-faq.md +[azure-network-policy]: use-network-policies.md#overview-of-network-policy +[node-image-upgrade]: node-image-upgrade.md +[byo-cni]: use-byo-cni.md +[open-service-mesh]: open-service-mesh-about.md +[gpu]: gpu-cluster.md +[multi-instance-gpu]: gpu-multi-instance.md +[gen-2-vms]: cluster-configuration.md#generation-2-virtual-machines +[custom-node-config]: custom-node-configuration.md +[custom-kubelet-parameters]: custom-node-configuration.md#kubelet-custom-configuration
api-management	Api Management Features	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-features.md	Each API Management [pricing tier](https://aka.ms/apimpricing) offers a distinct > [!IMPORTANT] > * The Developer tier is for non-production use cases and evaluations. It doesn't offer SLA. -> * The Consumption tier isn't available in the US Government cloud or the Microsoft Azure operated by 21Vianet cloud. +> * The Consumption tier isn't available in the US Government cloud or the Microsoft Azure operated by 21Vianet cloud. +> * API Management v2 tiers are now in preview, with updated feature availability. [Learn more](v2-service-tiers-overview.md). + \| Feature \| Consumption \| Developer \| Basic \| Standard \| Premium \| \| -- \| -- \| \| -- \| -- \| - \|
api-management	Api Management Gateways Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-gateways-overview.md	Related information: * For more information about the API Management service tiers and features, see [Feature-based comparison of the Azure API Management tiers](api-management-features.md). + ## Role of the gateway The API Management gateway (also called data plane or runtime) is the service component that's responsible for proxying API requests, applying policies, and collecting telemetry.
api-management	Api Management Howto Cache	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-howto-cache.md	What you'll learn: > [!NOTE] > Internal cache is not available in the Consumption tier of Azure API Management. You can [use an external Azure Cache for Redis](api-management-howto-cache-external.md) instead. +> +> For feature availability in the v2 tiers (preview), see the [v2 tiers overview](v2-service-tiers-overview.md). + ## Prerequisites
api-management	Api Management Howto Developer Portal Customize	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-howto-developer-portal-customize.md	Title: Tutorial - Access and customize the developer portal - Azure API Management \| Microsoft Docs -description: Follow this to tutorial to learn how to customize the API Management developer portal, an automatically generated, fully customizable website with the documentation of your APIs. +description: In this tutorial, customize the API Management developer portal, an automatically generated, fully customizable website with the documentation of your APIs. Previously updated : 11/21/2022 Last updated : 09/06/2023 You can find more details on the developer portal in the [Azure API Management d Follow the steps below to access the managed version of the portal. 1. In the [Azure portal](https://portal.azure.com), navigate to your API Management instance. -1. Select the Developer portal button in the top navigation bar. A new browser tab with an administrative version of the portal will open. - +1. If you created your instance in a v2 service tier that supports the developer portal, first enable the developer portal. + 1. In the left menu, under Developer portal, select Portal settings. + 1. In the Portal settings window, select Enabled. Select Save. + + It might take a few minutes to enable the developer portal. +1. In the left menu, under Developer portal, select Portal overview. Then select the Developer portal button in the top navigation bar. A new browser tab with an administrative version of the portal will open. ## Developer portal architectural concepts
api-management	Api Management Howto Manage Protocols Ciphers	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-howto-manage-protocols-ciphers.md	By default, API Management enables TLS 1.2 for client and backend connectivity a :::image type="content" source="media/api-management-howto-manage-protocols-ciphers/api-management-protocols-ciphers.png" alt-text="Screenshot of managing protocols and ciphers in the Azure portal."::: > [!NOTE] > * If you're using the self-hosted gateway, see [self-hosted gateway security](self-hosted-gateway-overview.md#security) to manage TLS protocols and cipher suites.
api-management	Api Management Using With Internal Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-using-with-internal-vnet.md	Title: Connect to an internal virtual network using Azure API Management -description: Learn how to set up and configure Azure API Management in a virtual network using internal mode + Title: Deploy Azure API Management instance to internal VNet +description: Learn how to deploy (inject) your Azure API instance to a virtual network in internal mode and access API backends through it. Last updated 01/03/2022 -# Connect to a virtual network in internal mode using Azure API Management -With Azure virtual networks (VNets), Azure API Management can manage internet-inaccessible APIs using several VPN technologies to make the connection. For VNet connectivity options, requirements, and considerations, see [Using a virtual network with Azure API Management](virtual-network-concepts.md). +# Deploy your Azure API Management instance to a virtual network - internal mode + +Azure API Management can be deployed (injected) inside an Azure virtual network (VNet) to access backend services within the network. For VNet connectivity options, requirements, and considerations, see [Using a virtual network with Azure API Management](virtual-network-concepts.md). This article explains how to set up VNet connectivity for your API Management instance in the internal mode. In this mode, you can only access the following API Management endpoints within a VNet whose access you control. * The API gateway Use API Management in internal mode to: :::image type="content" source="media/api-management-using-with-internal-vnet/api-management-vnet-internal.png" alt-text="Connect to internal VNet"::: -For configurations specific to the external mode, where the API Management endpoints are accessible from the public internet, and backend services are located in the network, see [Connect to a virtual network using Azure API Management](api-management-using-with-vnet.md). +For configurations specific to the external mode, where the API Management endpoints are accessible from the public internet, and backend services are located in the network, see [Deploy your Azure API Management instance to a virtual network - external mode](api-management-using-with-vnet.md). [!INCLUDE [updated-for-az](../../includes/updated-for-az.md)]
api-management	Api Management Using With Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-using-with-vnet.md	Title: Connect to a virtual network using Azure API Management -description: Learn how to set up a connection to a virtual network in Azure API Management and access API backends through it. + Title: Deploy Azure API Management instance to external VNet +description: Learn how to deploy (inject) your Azure API instance to a virtual network in external mode and access API backends through it. Last updated 01/03/2022 -# Connect to a virtual network using Azure API Management +# Deploy your Azure API Management instance to a virtual network - external mode -Azure API Management can be deployed inside an Azure virtual network (VNet) to access backend services within the network. For VNet connectivity options, requirements, and considerations, see [Using a virtual network with Azure API Management](virtual-network-concepts.md). +Azure API Management can be deployed (injected) inside an Azure virtual network (VNet) to access backend services within the network. For VNet connectivity options, requirements, and considerations, see [Using a virtual network with Azure API Management](virtual-network-concepts.md). This article explains how to set up VNet connectivity for your API Management instance in the external mode, where the developer portal, API gateway, and other API Management endpoints are accessible from the public internet, and backend services are located in the network. :::image type="content" source="media/api-management-using-with-vnet/api-management-vnet-external.png" alt-text="Connect to external VNet"::: -For configurations specific to the internal mode, where the endpoints are accessible only within the VNet, see [Connect to an internal virtual network using Azure API Management](./api-management-using-with-internal-vnet.md). +For configurations specific to the internal mode, where the endpoints are accessible only within the VNet, see [Deploy your Azure API Management instance to a virtual network - internal mode](./api-management-using-with-internal-vnet.md). [!INCLUDE [updated-for-az](../../includes/updated-for-az.md)]
api-management	Compute Infrastructure	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/compute-infrastructure.md	description: Learn about the compute platform used to host your API Management s Previously updated : 04/17/2023 Last updated : 09/18/2023 Most new instances created in service tiers other than the Consumption tier are ## What are the compute platforms for API Management? -The following table summarizes the compute platforms currently used for instances in the different API Management service tiers. +The following table summarizes the compute platforms currently used in the Consumption, Developer, Basic, Standard, and Premium tiers of API Management. \| Version \| Description \| Architecture \| Tiers \| \| -\| -\| -- \| - \| The `stv2` platform infrastructure supports several resiliency and security feat Migration steps depend on features enabled in your API Management instance. If the instance isn't injected in a VNet, you can use a migration API. For instances that are VNet-injected, follow manual steps. For details, see the [migration guide](migrate-stv1-to-stv2.md). +## What about the v2 pricing tiers? + +The v2 pricing tiers are a new set of tiers for API Management currently in preview. Hosted on a new, highly scalable and available Azure infrastructure that's different from the `stv1` and `stv2` compute platforms, the v2 tiers aren't affected by the retirement of the `stv1` platform. + +The v2 tiers are designed to make API Management accessible to a broader set of customers and offer flexible options for a wider variety of scenarios. For more information, see [v2 tiers overview](v2-service-tiers-overview.md). + ## Next steps * [Migrate an API Management instance to the stv2 platform](migrate-stv1-to-stv2.md).
api-management	Integrate Vnet Outbound	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/integrate-vnet-outbound.md	+ + Title: Connect API Management instance to a private network \| Microsoft Docs +description: Learn how to integrate an Azure API Management instance in the Standard v2 tier with a virtual network to access backend APIs hosted within the network. ++++ Last updated : 09/20/2023++ +# Integrate an Azure API Management instance with a private VNet for outbound connections (preview) + +This article guides you through the process of configuring VNet integration for your Azure API Management instance so that your API Management instance can make outbound requests to API backends that are isolated in the network. + +When an API Management instance is integrated with a virtual network for outbound requests, the API Management itself is not deployed in a VNet; the gateway and other endpoints remain publicly accessible. In this configuration, the API Management instance can reach both public and network-isolated backend services. +++ +## Prerequisites + +- An Azure API Management instance in the [Standard v2](v2-service-tiers-overview.md) pricing tier +- A virtual network with a subnet where your API Management backend APIs are hosted + - The network must be deployed in the same region as your API Management instance +- (Optional) For testing, a sample backend API hosted within a different subnet in the virtual network. For example, see [Tutorial: Establish Azure Functions private site access](../azure-functions/functions-create-private-site-access.md). + +## Delegate the subnet + +The subnet used for integration must be delegated to the Microsoft.Web/serverFarms service. In the subnet settings, in Delegate subnet to a service, select Microsoft.Web/serverFarms. ++ +For details, see [Add or remove a subnet delegation](../virtual-network/manage-subnet-delegation.md). + +## Enable VNet integration + +This section will guide you through the process of enabling VNet integration for your Azure API Management instance. + +1. In the [Azure portal](https://portal.azure.com), navigate to your API Management instance. +1. In the left menu, under Deployment + Infrastructure, select Network. +1. On the Outbound traffic card, select VNET integration. + + :::image type="content" source="media/integrate-vnet-outbound/integrate-vnet.png" lightbox="media/integrate-vnet-outbound/integrate-vnet.png" alt-text="Screenshot of VNet integration in the portal."::: + +1. In the Virtual network blade, enable the Virtual network checkbox. +1. Select the location of your API Management instance. +1. In Virtual network, select the virtual network and the delegated subnet that you want to integrate. +1. Select Apply, and then select Save. The VNet is integrated. + + :::image type="content" source="media/integrate-vnet-outbound/vnet-settings.png" lightbox="media/integrate-vnet-outbound/vnet-settings.png" alt-text="Screenshot of VNet settings in the portal."::: + +## (Optional) Test VNet integration + +If you have an API hosted in the virtual network, you can import it to your Management instance and test the VNet integration. For basic steps, see [Import and publish an API](import-and-publish.md). ++ +## Related content + +* [Use a virtual network with Azure API Management](virtual-network-concepts.md) +++
api-management	V2 Service Tiers Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/v2-service-tiers-overview.md	+ + Title: Azure API Management - v2 tiers (preview) +description: Introduction to key scenarios, capabilities, and concepts of the v2 tiers (SKUs) of the Azure API Management service. The v2 tiers are in preview. ++ +editor: '' + ++ Last updated : 10/02/2023++++ +# New Azure API Management tiers (preview) + +We're introducing a new set of pricing tiers (SKUs) for Azure API Management: the v2 tiers. The new tiers are built on a new, more reliable and scalable platform and are designed to make API Management accessible to a broader set of customers and offer flexible options for a wider variety of scenarios. + +Currently in preview, the following v2 tiers are available: + +* Basic v2 - The Basic v2 tier is designed for development and testing scenarios, and is supported with an SLA. In the Basic v2 tier, the developer portal is an optional add-on. + +* Standard v2 - Standard v2 is a production-ready tier with support planned for advanced API Management features previously available only in a Premium tier of API Management, including high availability and networking options. + +## Key capabilities + +* Faster deployment, configuration, and scaling - Deploy a production-ready API Management instance in minutes. Quickly apply configurations such as certificate and hostname updates. Scale a Basic v2 or Standard v2 instance quickly to up to 10 units to meet the needs of your API management workloads. + +* Simplified networking - The Standard v2 tier supports [outbound connections](#networking-options) to network-isolated backends. + +* More options for production workloads - The v2 tiers are all supported with an SLA. Upgrade from Basic v2 to Standard v2 to add more production options. + +* Developer portal options - Enable the [developer portal](api-management-howto-developer-portal.md) when you're ready to let API consumers discover your APIs. The developer portal is included in the Standard v2 tier, and is an add-on in the Basic v2 tier. + +## Networking options + +In preview, the v2 tiers currently support the following options to limit network traffic from your API Management instance to protected API backends: ++ +* Standard v2 + + Outbound - VNet integration to allow your API Management instance to reach API backends that are isolated in a VNet. The API Management gateway, management plane, and developer portal remain publicly accessible from the internet. The VNet must be in the same region as the API Management instance. [Learn more](integrate-vnet-outbound.md). + + +## Features and limitations + +### API version + +The v2 tiers are supported in API Management API version 2023-03-01-preview or later. + +### Supported regions + +In preview, the v2 tiers are available in the following regions: + +* East US +* South Central US +* West US +* France Central +* North Europe +* West Europe +* UK South +* Brazil South +* Australia East +* Australia Southeast +* East Asia + +### Feature availability + +Most capabilities of the existing (v1) tiers are planned for the v2 tiers. However, the following capabilities aren't supported in the v2 tiers: + +* API Management service configuration using Git +* Back up and restore of API Management instance +* Enabling Azure DDoS Protection + +### Preview limitations + +Currently, the following API Management capabilities are unavailable in the v2 tiers preview and are planned for later release. Where indicated, certain features are planned only for the Standard v2 tier. Features may be enabled during the preview period. ++ +Infrastructure and networking +* Zone redundancy (Standard v2) +* Multi-region deployment (Standard v2) +* Multiple custom domain names (Standard v2) +* Capacity metric +* Autoscaling +* Built-in analytics +* Inbound connection using a private endpoint +* Upgrade to v2 tiers from v1 tiers +* Workspaces + +Developer portal +* Delegation of user registration and product subscription +* Reports + +Gateway +* Self-hosted gateway (Standard v2) +* Management of Websocket APIs +* Rate limit by key and quota by key policies +* Cipher configuration +* Client certificate renegotiation +* Requests to the gateway over localhost + +## Deployment + +Deploy an instance of the Basic v2 or Standard v2 tier using the Azure portal, Azure REST API, or Azure Resource Manager or Bicep template. + +## Frequently asked questions + +### Q: Can I migrate from my existing API Management instance to a new v2 tier instance? + +A: No. Currently you can't migrate an existing API Management instance (in the Consumption, Developer, Basic, Standard, or Premium tier) to a new v2 tier instance. Currently the new tiers are available for newly created service instances only. + +### Q: What's the relationship between the stv2 compute platform and the v2 tiers? + +A: They're not related. stv2 is a [compute platform](compute-infrastructure.md) version of the Developer, Basic, Standard, and Premium tier service instances. stv2 is a successor to the stv1 platform [scheduled for retirement in 2024](./breaking-changes/stv1-platform-retirement-august-2024.md). + +### Q: Will I still be able to provision Basic or Standard tier services? + +A: Yes, there are no changes to the Basic or Standard tiers. + +### Q: What is the difference between VNet integration in Standard v2 tier and VNet support in the Premium tier? + +A: A Standard v2 service instance can be integrated with a VNet to provide secure access to the backends residing there. A Standard v2 service instance integrated with a VNet will have a public IP address that can be secured separately, via Private Link, if necessary. The Premium tier supports a [fully private integration](api-management-using-with-internal-vnet.md) with a VNet (often referred to as injection into VNet) without exposing a public IP address. + +### Q: Can I deploy an instance of the Basic v2 or Standard v2 tier entirely in my VNet? + +A: No, such a deployment is only supported in the Premium tier. + +## Related content + +* Learn more about the API Management [tiers](api-management-features.md).
api-management	Virtual Network Concepts	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/virtual-network-concepts.md	Title: Azure API Management with an Azure virtual network -description: Learn about scenarios and requirements to secure your API Management instance using an Azure virtual network. +description: Learn about scenarios and requirements to secure inbound and outbound traffic for your API Management instance using an Azure virtual network. Previously updated : 03/09/2023 Last updated : 09/14/2023 -# Use a virtual network with Azure API Management +# Use a virtual network to secure inbound and outbound traffic for Azure API Management -API Management provides several options to secure access to your API Management instance and APIs using an Azure virtual network. API Management supports the following options, which are mutually exclusive: +API Management provides several options to secure access to your API Management instance and APIs using an Azure virtual network. API Management supports the following options. Available options depend on the [service tier](api-management-features.md) of your API Management instance. -* Integration (injection) of the API Management instance into the virtual network, enabling the gateway to access resources in the network. - You can choose one of two integration modes: external or internal. They differ in whether inbound connectivity to the gateway and other API Management endpoints is allowed from the internet or only from within the virtual network. +* Injection of the API Management instance into a subnet in the virtual network, enabling the gateway to access resources in the network. -* Enabling secure and private inbound connectivity to the API Management gateway using a private endpoint. + You can choose one of two injection modes: external or internal. They differ in whether inbound connectivity to the gateway and other API Management endpoints is allowed from the internet or only from within the virtual network. +* Enabling secure and private inbound connectivity to the API Management gateway using a private endpoint. + The following table compares virtual networking options. For more information, see later sections of this article and links to detailed guidance. \|Networking model \|Supported tiers \|Supported components \|Supported traffic \|Usage scenario \| \|\|\|\|\|-\| -\|[Virtual network - external](#virtual-network-integration) \| Developer, Premium \| Developer portal, gateway, management plane, and Git repository \| Inbound and outbound traffic can be allowed to internet, peered virtual networks, Express Route, and S2S VPN connections. \| External access to private and on-premises backends -\|[Virtual network - internal](#virtual-network-integration) \| Developer, Premium \| Developer portal, gateway, management plane, and Git repository. \| Inbound and outbound traffic can be allowed to peered virtual networks, Express Route, and S2S VPN connections. \| Internal access to private and on-premises backends -\|[Inbound private endpoint](#inbound-private-endpoint) \| Developer, Basic, Standard, Premium \| Gateway only (managed gateway supported, self-hosted gateway not supported). \| Only inbound traffic can be allowed from internet, peered virtual networks, Express Route, and S2S VPN connections. \| Secure client connection to API Management gateway \| +\|[Virtual network injection - external](#virtual-network-injection) \| Developer, Premium \| Developer portal, gateway, management plane, and Git repository \| Inbound and outbound traffic can be allowed to internet, peered virtual networks, Express Route, and S2S VPN connections. \| External access to private and on-premises backends +\|[Virtual network injection - internal](#virtual-network-injection) \| Developer, Premium \| Developer portal, gateway, management plane, and Git repository. \| Inbound and outbound traffic can be allowed to peered virtual networks, Express Route, and S2S VPN connections. \| Internal access to private and on-premises backends +\|[Inbound private endpoint](#inbound-private-endpoint) \| Developer, Basic, Standard, Premium \| Gateway only (managed gateway supported, self-hosted gateway not supported). \| Only inbound traffic can be allowed from internet, peered virtual networks, Express Route, and S2S VPN connections. \| Secure client connection to API Management gateway \| ++ -## Virtual network integration -With Azure virtual networks (VNets), you can place ("inject") your API Management instance in a non-internet-routable network to which you control access. In a virtual network, your API Management instance can securely access other networked Azure resources and also connect to on-premises networks using various VPN technologies. To learn more about Azure VNets, start with the information in the [Azure Virtual Network Overview](../virtual-network/virtual-networks-overview.md). +## Virtual network injection +With VNet injection, deploy ("inject") your API Management instance in a subnet in a non-internet-routable network to which you control access. In the virtual network, your API Management instance can securely access other networked Azure resources and also connect to on-premises networks using various VPN technologies. To learn more about Azure VNets, start with the information in the [Azure Virtual Network Overview](../virtual-network/virtual-networks-overview.md). You can use the Azure portal, Azure CLI, Azure Resource Manager templates, or other tools for the configuration. You control inbound and outbound traffic into the subnet in which API Management is deployed by using [network security groups](../virtual-network/network-security-groups-overview.md). For detailed deployment steps and network configuration, see: -* [Connect to an external virtual network using Azure API Management](./api-management-using-with-vnet.md). -* [Connect to an internal virtual network using Azure API Management](./api-management-using-with-internal-vnet.md). +* [Deploy your API Management instance to a virtual network - external mode](./api-management-using-with-vnet.md). +* [Deploy your API Management instance to a virtual network - internal mode](./api-management-using-with-internal-vnet.md). ### Access options Using a virtual network, you can configure the developer portal, API gateway, and other API Management endpoints to be accessible either from the internet (external mode) or only within the VNet (internal mode). * External - The API Management endpoints are accessible from the public internet via an external load balancer. The gateway can access resources within the VNet. - :::image type="content" source="media/virtual-network-concepts/api-management-vnet-external.png" alt-text="Diagram showing a connection to external VNet." lightbox="media/virtual-network-concepts/api-management-vnet-external.png"::: + :::image type="content" source="media/virtual-network-concepts/api-management-vnet-external.png" alt-text="Diagram showing a connection to external VNet." ::: Use API Management in external mode to access backend services deployed in the virtual network. Using a virtual network, you can configure the developer portal, API gateway, an * Manage your APIs hosted in multiple geographic locations, using a single gateway endpoint. -### Network resource requirements +### Network resource requirements for injection -The following are virtual network resource requirements for API Management. Some requirements differ depending on the version (`stv2` or `stv1`) of the [compute platform](compute-infrastructure.md) hosting your API Management instance. +The following are virtual network resource requirements for API Management injection into a VNet. Some requirements differ depending on the version (`stv2` or `stv1`) of the [compute platform](compute-infrastructure.md) hosting your API Management instance. #### [stv2](#tab/stv2) One example is to deploy an API Management instance in an internal virtual netwo :::image type="content" source="media/virtual-network-concepts/api-management-application-gateway.png" alt-text="Diagram showing Application Gateway in front of API Management instance." lightbox="media/virtual-network-concepts/api-management-application-gateway.png"::: -For more information, see [Integrate API Management in an internal virtual network with Application Gateway](api-management-howto-integrate-internal-vnet-appgateway.md). +For more information, see [Deploy API Management in an internal virtual network with Application Gateway](api-management-howto-integrate-internal-vnet-appgateway.md). ## Next steps For more information, see [Integrate API Management in an internal virtual netwo Learn more about: Virtual network configuration with API Management: -* [Connect to an external virtual network using Azure API Management](./api-management-using-with-vnet.md). -* [Connect to an internal virtual network using Azure API Management](./api-management-using-with-internal-vnet.md). +* [Deploy your Azure API Management instance to a virtual network - external mode](./api-management-using-with-vnet.md). +* [Deploy your Azure API Management instance to a virtual network - internal mode](./api-management-using-with-internal-vnet.md). * [Connect privately to API Management using a private endpoint](private-endpoint.md) * [Defend your Azure API Management instance against DDoS attacks](protect-with-ddos-protection.md)
azure-arc	Upgrade	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-arc/resource-bridge/upgrade.md	This article describes how Arc resource bridge (preview) is upgraded and the two ## Prerequisites -In order to upgrade resource bridge, its status must be online and the [credentials in the appliance VM](maintenance.md#update-credentials-in-the-appliance-vm) must be valid. +In order to upgrade Arc resource bridge, its status must be online and the [credentials in the appliance VM](maintenance.md#update-credentials-in-the-appliance-vm) must be valid. There must be sufficient space on the management machine and appliance VM to download required images (~3.5 GB). For VMware, a new template is created. az arcappliance show --resource-group [REQUIRED] --name [REQUIRED] ## Manual upgrade -Arc resource bridge can be manually upgraded from the management machine. The management machine must have the kubeconfig and appliance configuration files stored locally. Manual upgrade generally takes between 30-90 minutes, depending on network speeds. +Arc resource bridge can be manually upgraded from the management machine. You must meet all upgrade prerequisites before attempting to upgrade. The management machine must have the kubeconfig and appliance configuration files stored locally. Manual upgrade generally takes between 30-90 minutes, depending on network speeds. To manually upgrade your Arc resource bridge, make sure you have installed the latest `az arcappliance` CLI extension by running the extension upgrade command from the management machine: To manually upgrade your resource bridge, use the following command: az arcappliance upgrade <private cloud> --config-file <file path to ARBname-appliance.yaml> ``` -For example: `az arcappliance upgrade vmware --config-file c:\contosoARB01-appliance.yaml` +For example, to upgrade a resource bridge on VMware: `az arcappliance upgrade vmware --config-file c:\contosoARB01-appliance.yaml` + +For example, to upgrade a resource bridge on Azure Stack HCI, run: `az arcappliance upgrade hci --config-file c:\contosoARB01-appliance.yaml` ## Private cloud providers Partner products that use Arc resource bridge may choose to handle upgrades differently, including enabling cloud-managed upgrade by default. This article will be updated to reflect any such changes. -[Azure Arc VM management (preview) on Azure Stack HCI](/azure-stack/hci/manage/azure-arc-vm-management-overview) handles upgrades across all components as a "validated recipe" package, and upgrades are applied using the LCM tool. You must manually apply the packaged upgrade using the LCM tool. +[Azure Arc VM management (preview) on Azure Stack HCI](/azure-stack/hci/manage/azure-arc-vm-management-overview) supports upgrade of an Arc resource bridge on Azure Stack HCI, version 22H2 up until Arc resource bridge version 1.0.14 and `az arcappliance` CLI extension version 0.2.33. These upgrades can be done through manual upgrade or a support request for cloud-managed upgrade. For additional upgrades afterwards, you must transition to Azure Stack HCI, version 23H2 (preview). In version 23H2 (preview), the LCM tool manages upgrades across all components as a "validated recipe" package. For more information, visit the [Arc VM management FAQ page](/azure-stack/hci/manage/faqs-arc-enabled-vms). ## Version releases If an Arc resource bridge is unable to be upgraded to a supported version, you m - Learn about [Arc resource bridge maintenance operations](maintenance.md). - Learn about [troubleshooting Arc resource bridge](troubleshoot-resource-bridge.md).+
azure-arc	License Extended Security Updates	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-arc/servers/license-extended-security-updates.md	Flexibility is critical when enrolling end of support infrastructure in Extended When provisioning WS2012 ESU licenses, you need to specify: * Either virtual core or physical core license -* Standard or datacenter license -* Attest to the number of associated cores (broken down by the number of 2-core and 16-core packs). +* Standard or Datacenter license + +You'll also need to attest to the number of associated cores (broken down by the number of 2-core and 16-core packs). To assist with the license provisioning process, this article provides general guidance and sample customer scenarios for planning your deployment of WS2012 ESUs through Azure Arc. If you choose to license based on virtual cores, the licensing requires a minimu An additional scenario (scenario 1, below) is a candidate for VM/Virtual core licensing when the WS2012 VMs are running on a newer Windows Server host (that is, Windows Server 2016 or later). > [!IMPORTANT] -> In all cases, you are required to attest to their conformance with SA or SPLA. There is no exception for these requirements. Software Assurance or an equivalent Server Subscription is required for you to purchase Extended Security Updates on-premises and in hosted environments. You will be able to purchase Extended Security Updates from Enterprise Agreement (EA), Enterprise Subscription Agreement (EAS), a Server & Cloud Enrollment (SCE), and Enrollment for Education Solutions (EES). On Azure, you do not need Software Assurance to get free Extended Security Updates, but Software Assurance or Server Subscription is required to take advantage of the Azure Hybrid Benefit. -> +> Customers that choose virtual core licensing will always be charged at the Standard edition rate, even if the actual operating system used is Datacenter edition. Additionally, virtual core licensing is not available for physical servers. +> + +### SA/SPLA conformance + +In all cases, you're required to attest to conformance with SA or SPLA. There is no exception for these requirements. Software Assurance or an equivalent Server Subscription is required for you to purchase Extended Security Updates on-premises and in hosted environments. You will be able to purchase Extended Security Updates from Enterprise Agreement (EA), Enterprise Subscription Agreement (EAS), a Server & Cloud Enrollment (SCE), and Enrollment for Education Solutions (EES). On Azure, you do not need Software Assurance to get free Extended Security Updates, but Software Assurance or Server Subscription is required to take advantage of the Azure Hybrid Benefit. ## Cost savings with migration and modernization of workloads
azure-netapp-files	Enable Continuous Availability Existing SMB	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-netapp-files/enable-continuous-availability-existing-SMB.md	You can enable the SMB Continuous Availability (CA) feature when you [create a n >[!IMPORTANT] > You should enable Continuous Availability for [Citrix App Layering](https://docs.citrix.com/en-us/citrix-app-layering/4.html), SQL Server, and [FSLogix user profile containers](../virtual-desktop/create-fslogix-profile-container.md). Using SMB Continuous Availability shares for any other workload is not supported. This feature is currently supported on Windows SQL Server. Linux SQL Server is not currently supported. > If you are using a non-administrator (domain) account to install SQL Server, ensure that the account has the required security privilege assigned. If the domain account does not have the required security privilege (`SeSecurityPrivilege`), and the privilege cannot be set at the domain level, you can grant the privilege to the account by using the Security privilege users field of Active Directory connections. See [Create an Active Directory connection](create-active-directory-connections.md#create-an-active-directory-connection).+ +>[!IMPORTANT] +> Change notifications are not supported with Continuously Available shares in Azure NetApp Files. ## Steps
azure-resource-manager	Private Module Registry	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/private-module-registry.md	To see the published module in the portal: You're now ready to reference the file in the registry from a Bicep file. For examples of the syntax to use for referencing an external module, see [Bicep modules](modules.md). + +## Working with Bicep registry files + +When leveraging bicep files that are hosted in a remote registry, it's important to understand how your local machine will interact with the regsitry. When you first declare the reference to the registry, your local editor will try to communicate with the Azure Containter Registry and download a copy of the registry to your local cache. + +The local cache is found in: + +- On Windows + + ```path + %USERPROFILE%\.bicep\br\<registry-name>.azurecr.io\<module-path\<tag> + ``` + +- On Linux + + ```path + /home/<username>/.bicep + ``` + +- On Mac + + ```path + ~/.bicep + ``` + +Any changes made to the remote registry will not be recognized by your local machine until a `restore` has been ran with the specified file that includes the registry reference. + +```azurecli +az bicep restore --file <bicep-file> [--force] +``` + +For more information refer to the [`restore` command.](bicep-cli.md#restore) ++ ## Next steps * To learn about modules, see [Bicep modules](modules.md).
azure-signalr	Howto Private Endpoints	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-signalr/howto-private-endpoints.md	Clients in VNets with existing private endpoints face constraints when accessing This constraint is a result of the DNS changes made when Azure SignalR Service S2 creates a private endpoint. -### Network Security Group rules for subnets with private endpoints - -Currently, you can't configure [Network Security Group](../virtual-network/network-security-groups-overview.md) (NSG) rules and user-defined routes for private endpoints. NSG rules applied to the subnet hosting the private endpoint are applied to the private endpoint. A limited workaround for this issue is to implement your access rules for private endpoints on the source subnets, though this approach may require a higher management overhead. - ## Next steps - [Configure Network Access Control](howto-network-access-control.md)
azure-web-pubsub	Howto Secure Private Endpoints	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-web-pubsub/howto-secure-private-endpoints.md	The Azure Web PubSub service free tier instance cannot integrate with private en Clients in VNets with existing private endpoints face constraints when accessing other Azure Web PubSub service instances that have private endpoints. For instance, suppose a VNet N1 has a private endpoint for an Azure Web PubSub service instance W1. If Azure Web PubSub service W2 has a private endpoint in a VNet N2, then clients in VNet N1 must also access Azure Web PubSub service W2 using a private endpoint. If Azure Web PubSub service W2 does not have any private endpoints, then clients in VNet N1 can access Azure Web PubSub service in that account without a private endpoint. This constraint is a result of the DNS changes made when Azure Web PubSub service W2 creates a private endpoint.- -### Network Security Group rules for subnets with private endpoints - -Currently, you can't configure [Network Security Group](../virtual-network/network-security-groups-overview.md) (NSG) rules and user-defined routes for private endpoints. NSG rules applied to the subnet hosting the private endpoint are applied to the private endpoint. A limited workaround for this issue is to implement your access rules for private endpoints on the source subnets, though this approach may require a higher management overhead. -
backup	Backup Support Matrix	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/backup/backup-support-matrix.md	Backup supports the compression of backup traffic, as summarized in the followin Maximum recovery points per protected instance (machine or workload) \| 9,999 Maximum expiry time for a recovery point \| No limit Maximum backup frequency to DPM/MABS \| Every 15 minutes for SQL Server<br/><br/> Once an hour for other workloads -Maximum backup frequency to vault \| On-premises Windows machines or Azure VMs running MARS: Three per day<br/><br/> DPM/MABS: Two per day<br/><br/> Azure VM backup: One per day +Maximum backup frequency to vault \| On-premises Windows machines or Azure VMs running MARS: Three per day. A maximum of 22 TB of data change is supported between backups.<br/><br/> DPM/MABS: Two per day<br/><br/> Azure VM backup: One per day Recovery point retention \| Daily, weekly, monthly, yearly Maximum retention period \| Depends on backup frequency Recovery points on DPM/MABS disk \| 64 for file servers; 448 for app servers <br/><br/>Unlimited tape recovery points for on-premises DPM
cdn	Cdn Features	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cdn/cdn-features.md	Previously updated : 06/05/2023 Last updated : 10/09/2023 The following table compares the features available with each product. \| [Query string caching](cdn-query-string.md) \| ✓ \|✓ \|✓ \|✓ \| \| IPv4/IPv6 dual-stack \| ✓ \|✓ \|✓ \|✓ \| \| [HTTP/2 support](cdn-http2.md) \| ✓ \|✓ \|✓ \|✓ \| +\| [Routing preference unmetered](../virtual-network/ip-services/routing-preference-unmetered.md) \| \|✓ \|✓ \|✓ \| \|\|\|\| Security \| Standard Microsoft \| Standard Akamai \| Standard Edgio \| Premium Edgio \| \| HTTPS support with CDN endpoint \| ✓ \|✓ \|✓ \|✓ \|
cloud-shell	Quickstart	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cloud-shell/quickstart.md	description: Learn how to start using Azure Cloud Shell. ms.contributor: jahelmic Previously updated : 03/06/2023 Last updated : 10/09/2023 tags: azure-resource-manager Title: Quickstart for Azure Cloud Shell # Quickstart for Azure Cloud Shell -This document details how to use Bash and PowerShell in Azure Cloud Shell from the -[Azure portal][03]. +This document details how to get started using Azure Cloud Shell. + +## Prerequisites + +Before you can use Azure Cloud Shell, you must register the Microsoft.CloudShell resource +provider. Access to resources is enabled through provider namespaces that must be registered in your +subscription. You only need to register the namespace once per subscription. + +To see all resource providers, and the registration status for your subscription: + +1. Sign in to the [Azure portal][03]. +1. On the Azure portal menu, search for Subscriptions. Select it from the available options. +1. Select the subscription you want to view. +1. On the left menu, under Settings, select Resource providers. +1. In the search box, enter `cloudshell` to search for the resource provider. +1. Select the Microsoft.CloudShell resource provider register from the provider list. +1. Select Register to change the status from unregistered to Registered. + + :::image type="content" source="./media/quickstart/resource-provider.png" alt-text="Screenshot of selecting resource providers in the Azure portal."::: ## Start Cloud Shell Cloud Shell allows you to select either Bash or PowerShell for your comm ![Screenshot showing the shell selector.][04] -### Registering your subscription with Azure Cloud Shell - -Azure Cloud Shell needs access to manage resources. Access is provided through namespaces that must -be registered to your subscription. Use the following commands to register the -Microsoft.CloudShell namespace in your subscription: - -<!-- markdownlint-disable MD023 --> -<!-- markdownlint-disable MD024 --> -<!-- markdownlint-disable MD051 --> -#### [Azure CLI](#tab/azurecli) - -```azurecli-interactive -az account set --subscription <Subscription Name or Id> -az provider register --namespace Microsoft.CloudShell -``` - -#### [Azure PowerShell](#tab/powershell) - -```azurepowershell-interactive -Select-AzSubscription -SubscriptionId <SubscriptionId> -Register-AzResourceProvider -ProviderNamespace Microsoft.CloudShell -``` -<!-- markdownlint-enable MD023 --> -<!-- markdownlint-enable MD024 --> -<!-- markdownlint-enable MD051 --> --- -> [!NOTE] -> You only need to register the namespace once per subscription. - ### Set your subscription 1. List subscriptions you have access to. -<!-- markdownlint-disable MD023 --> -<!-- markdownlint-disable MD024 --> -<!-- markdownlint-disable MD051 --> + <!-- markdownlint-disable MD023 MD024 MD051--> #### [Azure CLI](#tab/azurecli) ```azurecli-interactive Register-AzResourceProvider -ProviderNamespace Microsoft.CloudShell ```azurepowershell-interactive Get-AzSubscription ``` -<!-- markdownlint-enable MD023 --> -<!-- markdownlint-enable MD024 --> -<!-- markdownlint-enable MD051 --> - - + <!-- markdownlint-enable MD023 MD024 MD051--> 1. Set your preferred subscription: -<!-- markdownlint-disable MD023 --> -<!-- markdownlint-disable MD024 --> -<!-- markdownlint-disable MD051 --> + <!-- markdownlint-disable MD023 MD024 MD051--> #### [Azure CLI](#tab/azurecli) ```azurecli-interactive Register-AzResourceProvider -ProviderNamespace Microsoft.CloudShell ```azurepowershell-interactive Set-AzContext -Subscription <SubscriptionId> ``` -<!-- markdownlint-enable MD023 --> -<!-- markdownlint-enable MD024 --> -<!-- markdownlint-enable MD051 --> - + <!-- markdownlint-enable MD023 MD024 MD051--> > [!TIP] Register-AzResourceProvider -ProviderNamespace Microsoft.CloudShell ### Get a list of Azure commands -<!-- markdownlint-disable MD023 --> -<!-- markdownlint-disable MD024--> -<!-- markdownlint-disable MD051 --> +<!-- markdownlint-disable MD023 MD024 MD051--> #### [Azure CLI](#tab/azurecli) Run the following command to see a list of all Azure CLI commands. Run the following commands to get a list the Azure PowerShell commands that appl cd 'Azure:/My Subscription/WebApps' Get-AzCommand ``` -<!-- markdownlint-enable MD023 --> -<!-- markdownlint-enable MD024 --> -<!-- markdownlint-enable MD051 --> +<!-- markdownlint-enable MD023 MD024 MD051-->
communication-services	Manage Audio Filters	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/communication-services/how-tos/calling-sdk/manage-audio-filters.md	Title: Outgoing Audio Filters description: Use Azure Communication Services SDKs to set outgoing audio filters. -+ Last updated 07/27/2023 +zone_pivot_groups: acs-plat-ios-android-windows # Manage audio filters Learn how to manage audio processing features with the Azure Communication Servi - A user access token to enable the calling client. For more information, see [Create and manage access tokens](../../quickstarts/identity/access-tokens.md). - Optional: Complete the quickstart to [add voice calling to your application](../../quickstarts/voice-video-calling/getting-started-with-calling.md) ++ [!INCLUDE [Manage Audio Filters Windows](./includes/manage-audio-filters/manage-audio-filters-windows.md)] ## Next steps - [Learn how to manage calls](./manage-calls.md)
cost-management-billing	Reporting Get Started	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/costs/reporting-get-started.md	Title: Get started with Cost Management + Billing reporting - Azure -description: This article helps you to get started with Cost Management + Billing to understand, report on, and analyze your invoiced Microsoft Cloud and AWS costs. + Title: Get started with Cost Management reporting - Azure +description: This article helps you to get started with Cost Management to understand, report on, and analyze your invoiced Microsoft Cloud and AWS costs. Last updated 10/18/2022 -# Get started with Cost Management + Billing reporting +# Get started with Cost Management reporting -Cost Management + Billing includes several tools to help you understand, report on, and analyze your invoiced Microsoft Cloud and AWS costs. The following sections describe the major reporting components. +Cost Management includes several tools to help you understand, report on, and analyze your invoiced Microsoft Cloud and AWS costs. The following sections describe the major reporting components. ## Cost analysis The app is available for [iOS](https://itunes.apple.com/us/app/microsoft-azure/i - [Explore and analyze costs with cost analysis](quick-acm-cost-analysis.md). - [Analyze Azure costs with the Power BI App](analyze-cost-data-azure-cost-management-power-bi-template-app.md). - [Connect to Microsoft Cost Management data in Power BI Desktop](/power-bi/connect-data/desktop-connect-azure-cost-management).-- [Create and manage exported data](tutorial-export-acm-data.md). +- [Create and manage exported data](tutorial-export-acm-data.md).
cost-management-billing	Exchange And Refund Azure Reservations	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/reservations/exchange-and-refund-azure-reservations.md	Azure has the following policies for cancellations, exchanges, and refunds. - The new reservation's lifetime commitment should equal or be greater than the returned reservation's remaining commitment. Example: for a three-year reservation that's $100 per month and exchanged after the 18th payment, the new reservation's lifetime commitment should be $1,800 or more (paid monthly or upfront). - The new reservation purchased as part of exchange has a new term starting from the time of exchange. - There's no penalty or annual limits for exchanges. +- Exchanges will be unavailable for all compute reservations - Azure Reserved Virtual Machine Instances, AzureΓÇ»DedicatedΓÇ»Host reservations, and Azure App Services reservations - purchased on or after January 1, 2024. Compute reservations purchased prior to January 1, 2024 will reserve the right to exchange one more time after the policy change goes into effect. For more information about the exchange policy change, see [Changes to the Azure reservation exchange policy](reservation-exchange-policy-changes.md). Refund policies
cost-management-billing	Reservation Trade In	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/savings-plan/reservation-trade-in.md	Although compute reservation exchanges will end on January 1, 2024, noncompute r - You must have owner access on the Reservation Order to trade in an existing reservation. You can [Add or change users who can manage a savings plan](manage-savings-plan.md#who-can-manage-a-savings-plan). - To trade-in a reservation for a savings plan, you must have Azure RBAC Owner permission on the subscription you plan to use to purchase a savings plan. - EA Admin write permission or Billing profile contributor and higher, which are Cost Management + Billing permissions, are supported only for direct Savings plan purchases. They can't be used for savings plans purchases as a part of a reservation trade-in. +- The new savings plan's lifetime commitment should equal or be greater than the returned reservation's remaining commitment. Example: for a three-year reservation that's $100 per month and exchanged after the 18th payment, the new savings plan's lifetime commitment should be $1,800 or more (paid monthly or upfront). - Microsoft isn't currently charging early termination fees for reservation trade ins. We might charge the fees made in the future. We currently don't have a date for enabling the fee. ## How to trade in an existing reservation
data-factory	How To Create Custom Event Trigger	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/data-factory/how-to-create-custom-event-trigger.md	Data Factory expects events to follow the [Event Grid event schema](../event-gri 1. Select your custom topic from the Azure subscription dropdown or manually enter the event topic scope. > [!NOTE] - > To create or modify a custom event trigger in Data Factory, you need to use an Azure account with appropriate role-based access control (Azure RBAC). No additional permission is required. The Data Factory service principle does not require special permission to your Event Grid. For more information about access control, see the [Role-based access control](#role-based-access-control) section. + > To create or modify a custom event trigger in Data Factory, you need to use an Azure account with appropriate role-based access control (Azure RBAC). No additional permission is required. The Data Factory service principal does not require special permission to your Event Grid. For more information about access control, see the [Role-based access control](#role-based-access-control) section. 1. The Subject begins with and Subject ends with properties allow you to filter for trigger events. Both properties are optional.
defender-for-cloud	Defender For Containers Vulnerability Assessment Azure	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/defender-for-cloud/defender-for-containers-vulnerability-assessment-azure.md	Container vulnerability assessment powered by Qualys has the following capabilit \| Recommendation \| Description \| Assessment Key \|--\|--\|--\| \| [Container registry images should have vulnerability findings resolved (powered by Qualys)](https://ms.portal.azure.com/#view/Microsoft_Azure_Security_CloudNativeCompute/ContainerRegistryRecommendationDetailsBlade/assessmentKey/dbd0cb49-b563-45e7-9724-889e799fa648)\| Container image vulnerability assessment scans your registry for security vulnerabilities and exposes detailed findings for each image. Resolving the vulnerabilities can greatly improve your containers' security posture and protect them from attacks. \| dbd0cb49-b563-45e7-9724-889e799fa648 \| - \| [Running container images should have vulnerability findings resolved (powered by Qualys)](https://ms.portal.azure.com/#view/Microsoft_Azure_Security_CloudNativeCompute/KubernetesRuntimeVisibilityRecommendationDetailsBlade/assessmentKey/41503391-efa5-47ee-9282-4eff6131462c)ΓÇ»\| Container image vulnerability assessment scans container images running on your Kubernetes clusters for security vulnerabilities and exposes detailed findings for each image. Resolving the vulnerabilities can greatly improve your containers' security posture and protect them from attacks. \| 41503391-efa5-47ee-9282-4eff6131462c/ \| + \| [Running container images should have vulnerability findings resolved (powered by Qualys)](https://ms.portal.azure.com/#view/Microsoft_Azure_Security_CloudNativeCompute/KubernetesRuntimeVisibilityRecommendationDetailsBlade/assessmentKey/41503391-efa5-47ee-9282-4eff6131462c)ΓÇ»\| Container image vulnerability assessment scans container images running on your Kubernetes clusters for security vulnerabilities and exposes detailed findings for each image. Resolving the vulnerabilities can greatly improve your containers' security posture and protect them from attacks. \| 41503391-efa5-47ee-9282-4eff6131462c \| - Query vulnerability information via the Azure Resource Graph - Ability to query vulnerability information via the [Azure Resource Graph](/azure/governance/resource-graph/overview#how-resource-graph-complements-azure-resource-manager). Learn how to [query recommendations via the ARG](review-security-recommendations.md#review-recommendation-data-in-azure-resource-graph-arg).
defender-for-cloud	Permissions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/defender-for-cloud/permissions.md	Title: User roles and permissions description: This article explains how Microsoft Defender for Cloud uses role-based access control to assign permissions to users and identify the permitted actions for each role. Previously updated : 03/06/2023 Last updated : 10/09/2023 # User roles and permissions Defender for Cloud assesses the configuration of your resources to identify secu In addition to the built-in roles, there are two roles specific to Defender for Cloud: - Security Reader: A user that belongs to this role has read-only access to Defender for Cloud. The user can view recommendations, alerts, a security policy, and security states, but can't make changes.-- Security Admin: A user that belongs to this role has the same access as the Security Reader and can also update the security policy, dismiss alerts and recommendations, and apply recommendations. +- Security Admin: A user that belongs to this role has the same access as the Security Reader and can also update the security policy, and dismiss alerts and recommendations. We recommend that you assign the least permissive role needed for users to complete their tasks. For example, assign the Reader role to users who only need to view information about the security health of a resource but not take action, such as applying recommendations or editing policies.
defender-for-cloud	Upcoming Changes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/defender-for-cloud/upcoming-changes.md	Title: Important upcoming changes description: Upcoming changes to Microsoft Defender for Cloud that you might need to be aware of and for which you might need to plan Previously updated : 09/04/2023 Last updated : 10/09/2023 # Important upcoming changes to Microsoft Defender for Cloud > [!IMPORTANT] -> The information on this page relates to pre-release products or features, which may be substantially modified before they are commercially released, if ever. Microsoft makes no commitments or warranties, express or implied, with respect to the information provided here. +> The information on this page relates to pre-release products or features, which might be substantially modified before they are commercially released, if ever. Microsoft makes no commitments or warranties, express or implied, with respect to the information provided here. [Defender for Servers](#defender-for-servers) On this page, you can learn about changes that are planned for Defender for Cloud. It describes planned modifications to the product that might affect things like your secure score or workflows. If you're looking for the latest release notes, you can find them in the [What's \| Planned change \| Estimated date for change \| \|--\|--\| \| [Replacing the "Key Vaults should have purge protection enabled" recommendation with combined recommendation "Key Vaults should have deletion protection enabled"](#replacing-the-key-vaults-should-have-purge-protection-enabled-recommendation-with-combined-recommendation-key-vaults-should-have-deletion-protection-enabled) \| June 2023\| -\| [Changes to the Defender for DevOps recommendations environment source and resource ID](#changes-to-the-defender-for-devops-recommendations-environment-source-and-resource-id) \| August 2023 \| \| [Preview alerts for DNS servers to be deprecated](#preview-alerts-for-dns-servers-to-be-deprecated) \| August 2023 \| -\| [Deprecate and replace recommendations App Service Client Certificates](#deprecate-and-replace-recommendations-app-service-client-certificates) \| August 2023 \| \| [Classic connectors for multicloud will be retired](#classic-connectors-for-multicloud-will-be-retired) \| September 2023 \| -\| [Replacing secret scanning recommendation results in Defender for DevOps from CredScan with GitHub Advanced Security for Azure DevOps powered secret scanning](#replacing-secret-scanning-recommendation-results-in-defender-for-devops-from-credscan-with-github-advanced-security-for-azure-devops-powered-secret-scanning) \| September 2023 \| \| [Change to the Log Analytics daily cap](#change-to-the-log-analytics-daily-cap) \| September 2023 \| -\| [Deprecating and replacing "Microsoft Defender for Storage plan should be enabled" recommendation](#deprecating-and-replacing-microsoft-defender-for-storage-plan-should-be-enabled-recommendation) \| September 2023\| -\| [DevOps Resource Deduplication for Defender for DevOps](#devops-resource-deduplication-for-defender-for-devops) \| September 2023 \| +\| [DevOps Resource Deduplication for Defender for DevOps](#devops-resource-deduplication-for-defender-for-devops) \| November 2023 \| +\| [Changes to Attack Path's Azure Resource Graph table scheme](#changes-to-attack-paths-azure-resource-graph-table-scheme) \| November 2023 \| \| [Defender for Cloud plan and strategy for the Log Analytics agent deprecation](#defender-for-cloud-plan-and-strategy-for-the-log-analytics-agent-deprecation) \| August 2024 \| -### Replacing secret scanning recommendation results in Defender for DevOps from CredScan with GitHub Advanced Security for Azure DevOps powered secret scanning +### Replacing the "Key Vaults should have purge protection enabled" recommendation with combined recommendation "Key Vaults should have deletion protection enabled" + +Estimated date for change: June 2023 + +The `Key Vaults should have purge protection enabled` recommendation is deprecated from the (regulatory compliance dashboard/Azure security benchmark initiative) and replaced with a new combined recommendation `Key Vaults should have deletion protection enabled`. + +\| Recommendation name \| Description \| Effect(s) \| Version \| +\|--\|--\|--\|--\| +\| [Key vaults should have deletion protection enabled](https://ms.portal.azure.com/#view/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2Fproviders%2FMicrosoft.Authorization%2FpolicyDefinitions%2F0b60c0b2-2dc2-4e1c-b5c9-abbed971de53)\| A malicious insider in your organization can potentially delete and purge key vaults. Purge protection protects you from insider attacks by enforcing a mandatory retention period for soft deleted key vaults. No one inside your organization or Microsoft will be able to purge your key vaults during the soft delete retention period. \| audit, deny, disabled \| [2.0.0](https://github.com/Azure/azure-policy/blob/master/built-in-policies/policyDefinitions/Key%20Vault/KeyVault_Recoverable_Audit.json) \| + +See the [full index of Azure Policy built-in policy definitions for Key Vault](../key-vault/policy-reference.md) -Estimated date for change: September 2023 +### Preview alerts for DNS servers to be deprecated -Currently, the recommendations for secret scanning in Azure DevOps repositories by Defender for DevOps are based on the results of CredScan, which is manually run using the Microsoft Security DevOps Extension. However, this mechanism of running secret scanning is being deprecated in September 2023. Instead, you can see secret scanning results generated by GitHub Advanced Security for Azure DevOps (GHAzDO). +Estimated date for change: August 2023 -As GHAzDO enters Public Preview, we're working towards unifying the secret scanning experience across both GitHub Advanced Security and GHAzDO. This unification enables you to receive detections across all branches, git history, and secret leak protection via push protection to your repositories. This process can all be done with a single button press, without requiring any pipeline runs. +Following quality improvement process, security alerts for DNS servers are set to be deprecated in August. For cloud resources, use [Azure DNS](defender-for-dns-introduction.md) to receive the same security value. -For more information about GHAzDO Secret Scanning, see [Set up secret scanning](/azure/devops/repos/security/configure-github-advanced-security-features#set-up-secret-scanning). +The following table lists the alerts to be deprecated: + +\| AlertDisplayName \| AlertType \| +\|--\|--\| +\| Communication with suspicious random domain name (Preview) \| DNS_RandomizedDomain +\| Communication with suspicious domain identified by threat intelligence (Preview) \| DNS_ThreatIntelSuspectDomain \| +\| Digital currency mining activity (Preview) \| DNS_CurrencyMining \| +\| Network intrusion detection signature activation (Preview) \| DNS_SuspiciousDomain \| +\| Attempted communication with suspicious sinkholed domain (Preview) \| DNS_SinkholedDomain \| +\| Communication with possible phishing domain (Preview) \| DNS_PhishingDomain\| +\| Possible data transfer via DNS tunnel (Preview) \| DNS_DataObfuscation \| +\| Possible data exfiltration via DNS tunnel (Preview) \| DNS_DataExfiltration \| +\| Communication with suspicious algorithmically generated domain (Preview) \| DNS_DomainGenerationAlgorithm \| +\| Possible data download via DNS tunnel (Preview) \| DNS_DataInfiltration \| +\| Anonymity network activity (Preview) \| DNS_DarkWeb \| +\| Anonymity network activity using web proxy (Preview) \| DNS_DarkWebProxy \| ### Classic connectors for multicloud will be retired How to migrate to the native security connectors: - [Connect your AWS account to Defender for Cloud](quickstart-onboard-aws.md) - [Connect your GCP project to Defender for Cloud](quickstart-onboard-gcp.md) -### Defender for Cloud plan and strategy for the Log Analytics agent deprecation +### Change to the Log Analytics daily cap + +Azure monitor offers the capability to [set a daily cap](../azure-monitor/logs/daily-cap.md) on the data that is ingested on your Log analytics workspaces. However, Defender for Cloud security events are currently not supported in those exclusions. + +Starting on September 18, 2023 the Log Analytics Daily Cap will no longer exclude the following set of data types: + +- WindowsEvent +- SecurityAlert +- SecurityBaseline +- SecurityBaselineSummary +- SecurityDetection +- SecurityEvent +- WindowsFirewall +- MaliciousIPCommunication +- LinuxAuditLog +- SysmonEvent +- ProtectionStatus +- Update +- UpdateSummary +- CommonSecurityLog +- Syslog + +At that time, all billable data types will be capped if the daily cap is met. This change improves your ability to fully contain costs from higher-than-expected data ingestion. + +Learn more about [workspaces with Microsoft Defender for Cloud](../azure-monitor/logs/daily-cap.md#workspaces-with-microsoft-defender-for-cloud). -Estimated date for change: August 2024 -The Azure Log Analytics agent, also known as the Microsoft Monitoring Agent (MMA) will be [retired in August 2024.](https://azure.microsoft.com/updates/were-retiring-the-log-analytics-agent-in-azure-monitor-on-31-august-2024/) As a result, features of the two Defender for Cloud plans that rely on the Log Analytics agent are impacted, and they have updated strategies: [Defender for Servers](#defender-for-servers) and [Defender for SQL Server on machines](#defender-for-sql-server-on-machines). #### Key strategy points The Azure Log Analytics agent, also known as the Microsoft Monitoring Agent (MMA - Defender for Servers MMA-based features and capabilities will be deprecated in their Log Analytics version in August 2024, and delivered over alternative infrastructures, before the MMA deprecation date. - In addition, the currently shared autoprovisioning process that provides the installation and configuration of both agents (MMA/AMA), will be adjusted accordingly. + #### Defender for Servers The following table explains how each capability will be provided after the Log Analytics agent retirement: To ensure the security of your servers and receive all the security updates from Following that, plan your migration plan according to your organization requirements:┬á -\|\|Azure Monitor agent (AMA) required (for Defender for SQL or other scenarios)\|FIM/EPP discovery/Baselined is required as part of Defender for Server\|What should I do\| -\| -- \| -- \| -- \| -- \| -\|┬á\|No┬á\|Yes┬á\|You can remove MMA starting April 2024, using GA version of Defender for Server capabilities according to your needs (preview versions will be available earlier)┬á┬á\| -\|┬á\|No┬á\|No┬á\|You can remove MMA starting now┬á\| -\|┬á\|Yes┬á\|No┬á\|You can start migration from MMA to AMA now┬á\| -\|┬á\|Yes┬á\|Yes┬á\|You can either start migration from MMA to AMA starting April 2024 or alternatively, you can use both agents side by side starting now.┬á\| +\|Azure Monitor agent (AMA) required (for Defender for SQL or other scenarios)\|FIM/EPP discovery/Baselined is required as part of Defender for Server\|What should I do\| +\| -- \| -- \| -- \| +\|No┬á\|Yes┬á\|You can remove MMA starting April 2024, using GA version of Defender for Server capabilities according to your needs (preview versions will be available earlier)┬á┬á\| +\|No┬á\|No┬á\|You can remove MMA starting now┬á\| +\|Yes┬á\|No┬á\|You can start migration from MMA to AMA now┬á\| +\|Yes┬á\|Yes┬á\|You can either start migration from MMA to AMA starting April 2024 or alternatively, you can use both agents side by side starting now.┬á\| Customers with Log analytics Agent (MMA) enabled┬á The following section describes the planned introduction of a new and improved S \| SQL-targeted AMA autoprovisioning GA release \| December 2023 \| GA release of a SQL-targeted AMA autoprovisioning process. Following the release, it will be defined as the default option for all new customers. \| \| MMA deprecation \| August 2024 \| The current MMA autoprovisioning process and its related policy initiative will be deprecated. It can still be used customers, but they won't be eligible for support. \| -### Replacing the "Key Vaults should have purge protection enabled" recommendation with combined recommendation "Key Vaults should have deletion protection enabled" - -Estimated date for change: June 2023 - -The `Key Vaults should have purge protection enabled` recommendation is deprecated from the (regulatory compliance dashboard/Azure security benchmark initiative) and replaced with a new combined recommendation `Key Vaults should have deletion protection enabled`. - -\| Recommendation name \| Description \| Effect(s) \| Version \| -\|--\|--\|--\|--\| -\| [Key vaults should have deletion protection enabled](https://ms.portal.azure.com/#view/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2Fproviders%2FMicrosoft.Authorization%2FpolicyDefinitions%2F0b60c0b2-2dc2-4e1c-b5c9-abbed971de53)\| A malicious insider in your organization can potentially delete and purge key vaults. Purge protection protects you from insider attacks by enforcing a mandatory retention period for soft deleted key vaults. No one inside your organization or Microsoft will be able to purge your key vaults during the soft delete retention period. \| audit, deny, disabled \| [2.0.0](https://github.com/Azure/azure-policy/blob/master/built-in-policies/policyDefinitions/Key%20Vault/KeyVault_Recoverable_Audit.json) \| - -See the [full index of Azure Policy built-in policy definitions for Key Vault](../key-vault/policy-reference.md) - -### Changes to the Defender for DevOps recommendations environment source and resource ID - -Estimated date for change: August 2023 - -The Security DevOps recommendations will be updated to align with the overall Microsoft Defender for Cloud features and experience. Affected recommendations will point to a new recommendation source environment and have an updated resource ID. - -Security DevOps recommendations impacted: --- Code repositories should have code scanning findings resolved (preview)-- Code repositories should have secret scanning findings resolved (preview)-- Code repositories should have dependency vulnerability scanning findings resolved (preview)-- Code repositories should have infrastructure as code scanning findings resolved (preview)-- GitHub repositories should have code scanning enabled (preview)-- GitHub repositories should have Dependabot scanning enabled (preview)-- GitHub repositories should have secret scanning enabled (preview)- -The recommendation environment source will be updated from `Azure` to `AzureDevOps` or `GitHub`. - -The format for resource IDs will be changed from: - -`Microsoft.SecurityDevOps/githubConnectors/owners/repos/` - -To: - -`Microsoft.Security/securityConnectors/devops/azureDevOpsOrgs/projects/repos` -`Microsoft.Security/securityConnectors/devops/gitHubOwners/repos` - -As a part of the migration, source code management system specific recommendations will be created for security findings: --- GitHub repositories should have code scanning findings resolved (preview)-- GitHub repositories should have secret scanning findings resolved (preview)-- GitHub repositories should have dependency vulnerability scanning findings resolved (preview)-- GitHub repositories should have infrastructure as code scanning findings resolved (preview)-- GitHub repositories should have code scanning enabled (preview)-- GitHub repositories should have Dependabot scanning enabled (preview)-- GitHub repositories should have secret scanning enabled (preview)-- Azure DevOps repositories should have code scanning findings resolved (preview)-- Azure DevOps repositories should have secret scanning findings resolved (preview)-- Azure DevOps repositories should have infrastructure as code scanning findings resolved (preview)- -Customers that rely on the `resourceID` to query DevOps recommendation data will be affected. For example, Azure Resource Graph queries, workbooks queries, API calls to Microsoft Defender for Cloud. - -Queries will need to be updated to include both the old and new `resourceID` to show both, for example, total over time. - -Additionally, customers that have created custom queries using the DevOps workbook will need to update the assessment keys for the impacted DevOps security recommendations. The template DevOps workbook is planned to be updated to reflect the new recommendations, although during the actual migration, customers may experience some errors with the workbook. - -The experience on the recommendations page will be impacted and require customers to query under "All recommendations" to view the new DevOps recommendations. For Azure DevOps, deprecated assessments may continue to show for a maximum of 14 days if new pipelines are not run. Refer to [Defender for DevOps Common questions](/azure/defender-for-cloud/faq-defender-for-devops#why-don-t-i-see-recommendations-for-findings-) for details. - -### Preview alerts for DNS servers to be deprecated - -Estimated date for change: August 2023 - -Following quality improvement process, security alerts for DNS servers are set to be deprecated in August. For cloud resources, use [Azure DNS](defender-for-dns-introduction.md) to receive the same security value. - -The following table lists the alerts to be deprecated: - -\| AlertDisplayName \| AlertType \| -\|--\|--\| -\| Communication with suspicious random domain name (Preview) \| DNS_RandomizedDomain -\| Communication with suspicious domain identified by threat intelligence (Preview) \| DNS_ThreatIntelSuspectDomain \| -\| Digital currency mining activity (Preview) \| DNS_CurrencyMining \| -\| Network intrusion detection signature activation (Preview) \| DNS_SuspiciousDomain \| -\| Attempted communication with suspicious sinkholed domain (Preview) \| DNS_SinkholedDomain \| -\| Communication with possible phishing domain (Preview) \| DNS_PhishingDomain\| -\| Possible data transfer via DNS tunnel (Preview) \| DNS_DataObfuscation \| -\| Possible data exfiltration via DNS tunnel (Preview) \| DNS_DataExfiltration \| -\| Communication with suspicious algorithmically generated domain (Preview) \| DNS_DomainGenerationAlgorithm \| -\| Possible data download via DNS tunnel (Preview) \| DNS_DataInfiltration \| -\| Anonymity network activity (Preview) \| DNS_DarkWeb \| -\| Anonymity network activity using web proxy (Preview) \| DNS_DarkWebProxy \| - -### Deprecate and replace recommendations App Service Client Certificates - -Estimated date for change: August 2023 - -App Service policies are set to be deprecated and replaced so that they only monitor apps using HTTP 1.1 since HTTP 2.0 on App Service doesn't support client certificates. The existing policies that enforce client certificates require an additional check to determine if Http 2.0 is being used by the app. Adding this additional check requires a change to the policy "effect" from Audit to AuditIfNotExists. Policy "effect" changes require deprecation of the old version of the policy and the creation of a replacement. - -Policies in this scope: --- App Service apps should have Client Certificates (Incoming client certificates) enabled-- App Service app slots should have Client Certificates (Incoming client certificates) enabled-- Function apps should have Client Certificates (Incoming client certificates) enabled-- Function app slots should have Client Certificates (Incoming client certificates) enabled- -Customers who are currently using this policy will need to ensure they have the new policies with similar names enabled and assigned to their intended scope. - -### Change to the Log Analytics daily cap - -Azure monitor offers the capability to [set a daily cap](../azure-monitor/logs/daily-cap.md) on the data that is ingested on your Log analytics workspaces. However, Defender for Cloud security events are currently not supported in those exclusions. - -Starting on September 18, 2023 the Log Analytics Daily Cap will no longer exclude the following set of data types: --- WindowsEvent-- SecurityAlert-- SecurityBaseline-- SecurityBaselineSummary-- SecurityDetection-- SecurityEvent-- WindowsFirewall-- MaliciousIPCommunication-- LinuxAuditLog-- SysmonEvent-- ProtectionStatus-- Update-- UpdateSummary-- CommonSecurityLog-- Syslog- -At that time, all billable data types will be capped if the daily cap is met. This change improves your ability to fully contain costs from higher-than-expected data ingestion. - -Learn more about [workspaces with Microsoft Defender for Cloud](../azure-monitor/logs/daily-cap.md#workspaces-with-microsoft-defender-for-cloud). +### DevOps Resource Deduplication for Defender for DevOps -## Deprecating and replacing "Microsoft Defender for Storage plan should be enabled" recommendation +Estimated date for change: November 2023 -Estimated date for change: September 2023 +To improve the Defender for DevOps user experience and enable further integration with Defender for Cloud's rich set of capabilities, Defender for DevOps will no longer support duplicate instances of a DevOps organization to be onboarded to an Azure tenant. -The recommendation `Microsoft Defender for Storage plan should be enabled` will be deprecated on public clouds and will remain available on Azure Government cloud. This recommendation will be replaced by a new recommendation: `Microsoft Defender for Storage plan should be enabled with Malware Scanning and Sensitive Data Threat Detection`. This recommendation ensures that Defender for Storage is enabled at the subscription level with malware scanning and sensitive data threat detection capabilities. +If you don't have an instance of a DevOps organization onboarded more than once to your organization, no further action is required. If you do have more than one instance of a DevOps organization onboarded to your tenant, the subscription owner will be notified and will need to delete the DevOps Connector(s) they don't want to keep by navigating to Defender for Cloud Environment Settings. -\| Policy Name \| Description \| Policy Effect \| Version \| -\|--\|--\|--\|--\| -\| [Microsoft Defender for Storage should be enabled](https://ms.portal.azure.com/#view/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2fproviders%2fMicrosoft.Authorization%2fpolicyDefinitions%2f640d2586-54d2-465f-877f-9ffc1d2109f4) \| Microsoft Defender for Storage detects potential threats to your storage accounts. It helps prevent the three major impacts on your data and workload: malicious file uploads, sensitive data exfiltration, and data corruption. The new Defender for Storage plan includes malware scanning and sensitive data threat detection.This plan also provides a predictable pricing structure (per storage account) for control over coverage and costs. \| Audit, disabled \| 1.0.0 \| +Customers will have until November 14, 2023 to resolve this issue. After this date, only the most recent DevOps Connector created where an instance of the DevOps organization exists will remain onboarded to Defender for DevOps. For example, if Organization Contoso exists in both connectorA and connectorB, and connectorB was created after connectorA, then connectorA will be removed from Defender for DevOps. -Learn more about [Microsoft Defender for Storage](defender-for-storage-introduction.md). +### Changes to Attack Path's Azure Resource Graph table scheme -### DevOps Resource Deduplication for Defender for DevOps +Estimated date for change: November 2023 -Estimated date for change: September 2023 +The Attack Path's Azure Resource Graph (ARG) table scheme will be updated. The `attackPathType` property wil be removed and additional properties will be added. -To improve the Defender for DevOps user experience and enable further integration with Defender for Cloud's rich set of capabilities, Defender for DevOps will no longer support duplicate instances of a DevOps organization to be onboarded to an Azure tenant. +### Defender for Cloud plan and strategy for the Log Analytics agent deprecation -If you don't have an instance of a DevOps organization onboarded more than once to your organization, no further action is required. If you do have more than one instance of a DevOps organization onboarded to your tenant, the subscription owner will be notified and will need to delete the DevOps Connector(s) they don't want to keep by navigating to Defender for Cloud Environment Settings. +Estimated date for change: August 2024 -Customers will have until September 30, 2023 to resolve this issue. After this date, only the most recent DevOps Connector created where an instance of the DevOps organization exists will remain onboarded to Defender for DevOps. For example, if Organization Contoso exists in both connectorA and connectorB, and connectorB was created after connectorA, then connectorA will be removed from Defender for DevOps. +The Azure Log Analytics agent, also known as the Microsoft Monitoring Agent (MMA) will be [retired in August 2024.](https://azure.microsoft.com/updates/were-retiring-the-log-analytics-agent-in-azure-monitor-on-31-august-2024/) As a result, features of the two Defender for Cloud plans that rely on the Log Analytics agent are impacted, and they have updated strategies: [Defender for Servers](#defender-for-servers) and [Defender for SQL Server on machines](#defender-for-sql-server-on-machines). ## Next steps
event-grid	Event Domains	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/event-grid/event-domains.md	Title: Event Domains in Azure Event Grid description: This article describes how to use event domains to manage the flow of custom events to your various business organizations, customers, or applications. Previously updated : 11/17/2022 Last updated : 10/09/2023 # Understand event domains for managing Event Grid topics -An event domain is a management tool for large number of Event Grid topics related to the same application. You can think of it as a meta-topic that can have thousands of individual topics. It allows an event publisher to publish events to thousands of topics at the same time. Domains also give you authentication and authorization control over each topic so you can partition your tenants. This article describes how to use event domains to manage the flow of custom events to your various business organizations, customers, or applications. Use event domains to: +An event domain is a management tool for large number of Event Grid topics related to the same application. You can think of it as a meta-topic that can have thousands of individual topics. It provides one publishing endpoint for all the topics in the domain. When publishing an event, the publisher must specify the target topic in the domain to which it wants to publish. The publisher can send an array or a batch of events where events are sent to different topics in the domain. See the [Publishing events to an event domain](#publishing-to-an-event-domain) section for details. + +Domains also give you authentication and authorization control over each topic so you can partition your tenants. This article describes how to use event domains to manage the flow of custom events to your various business organizations, customers, or applications. Use event domains to: * Manage multitenant eventing architectures at scale. * Manage your authentication and authorization. * Partition your topics without managing each individually. * Avoid individually publishing to each of your topic endpoints. +> [!NOTE] +> Event domain is not intended to support broadcast scenario where an event is sent to a domain and each topic in the domain receives a copy of the event. When publishing events, the publisher must specify the target topic in the domain to which it wants to publish. If the publisher wants to publish the same event payload to multiple topics in the domain, the publisher needs to duplicate the event payload, and change the topic name, and publish them to Event Grid using the domain endpoint, either individually or as a batch. + ## Example use case [!INCLUDE [domain-example-use-case.md](./includes/domain-example-use-case.md)] Subscribing to events for a topic within an event domain is the same as [creatin ### Domain scope subscriptions -Event domains also allow for domain-scope subscriptions. An event subscription on an event domain will receive all events sent to the domain regardless of the topic the events are sent to. Domain scope subscriptions can be useful for management and auditing purposes. +Event domains also allow for domain-scope subscriptions. An event subscription on an event domain receives all events sent to the domain regardless of the topic the events are sent to. Domain scope subscriptions can be useful for management and auditing purposes. ## Publishing to an event domain -When you create an event domain, you're given a publishing endpoint similar to if you had created a topic in Event Grid. To publish events to any topic in an event domain, push the events to the domain's endpoint the [same way you would for a custom topic](./post-to-custom-topic.md). The only difference is that you must specify the topic you'd like the event to be delivered to. For example, publishing the following array of events would send event with `"id": "1111"` to topic `foo` while the event with `"id": "2222"` would be sent to topic `bar`: +When you create an event domain, you're given a publishing endpoint similar to if you had created a topic in Event Grid. To publish events to any topic in an event domain, push the events to the domain's endpoint the [same way you would for a custom topic](./post-to-custom-topic.md). The only difference is that you must specify the topic you'd like the event to be delivered to. For example, publishing the following array of events would send event with `"id": "1111"` to topic `foo` while the event with `"id": "2222"` would be sent to topic `bar`. ++ +# [Event Grid event schema](#tab/event-grid-event-schema) +When using the Event Grid event schema, specify the name of the Event Grid topic in the domain as a value for the `topic` property. In the following example, `topic` property is set to `foo` for the first event and to `bar` for the second event. ```json [{ When you create an event domain, you're given a publishing endpoint similar to i "dataVersion": "1.0" }] ``` +# [Cloud event schema](#tab/cloud-event-schema) + +When using the cloud event schema, specify the name of the Event Grid topic in the domain as a value for the `source` property. In the following example, `source` property is set to `foo` for the first event and to `bar` for the second event. + +If you want to use a different field to specify the intended topic in the domain, specify input schema mapping when creating the domain. For example, if you're using the REST API, use the [properties.inputSchemaMapping](/rest/api/eventgrid/controlplane-preview/domains/create-or-update#jsoninputschemamapping) property when to map that field to `properties.topic`. If you're using the .NET SDK, use [`EventGridJsonInputSchemaMapping `](/dotnet/api/azure.resourcemanager.eventgrid.models.eventgridjsoninputschemamapping). Other SDKs also support the schema mapping. + +```json +[{ + "source": "foo", + "id": "1111", + "type": "maintenanceRequested", + "subject": "myapp/vehicles/diggers", + "time": "2018-10-30T21:03:07+00:00", + "data": { + "make": "Contoso", + "model": "Small Digger" + }, + "specversion": "1.0" +}, +{ + "source": "bar", + "id": "2222", + "type": "maintenanceCompleted", + "subject": "myapp/vehicles/tractors", + "time": "2018-10-30T21:04:12+00:00", + "data": { + "make": "Contoso", + "model": "Big Tractor" + }, + "specversion": "1.0" +}] +``` ++ Event domains handle publishing to topics for you. Instead of publishing events to each topic you manage individually, you can publish all of your events to the domain's endpoint. Event Grid makes sure each event is sent to the correct topic.
event-grid	Mqtt Publish And Subscribe Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/event-grid/mqtt-publish-and-subscribe-portal.md	After a successful installation of Step, you should open a command prompt in you 2. On the Topic spaces page, select + Topic space on the toolbar. :::image type="content" source="./media/mqtt-publish-and-subscribe-portal/create-topic-space-menu.png" alt-text="Screenshot of Topic spaces page with create button selected." lightbox="./media/mqtt-publish-and-subscribe-portal/create-topic-space-menu.png"::: -1. On the Create topic space page, enter a name for the topic space. +1. Provide a name for the topic space, on the Create topic space page. 1. Select + Add topic template. :::image type="content" source="./media/mqtt-publish-and-subscribe-portal/create-topic-space-name.png" alt-text="Screenshot of Create topic space with the name.":::
event-hubs	Event Hubs About	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/event-hubs/event-hubs-about.md	Title: What is Azure Event Hubs? - a Big Data ingestion service -description: Learn about Azure Event Hubs, a Big Data streaming service that ingests millions of events per second. + Title: Azure Event Hubs ΓÇô data streaming platform with Kafka support +description: Learn about Azure Event Hubs, A real-time data streaming platform with native Apache Kafka support. Previously updated : 03/07/2023 Last updated : 10/09/2023 -# What is Azure Event Hubs? ΓÇö A big data streaming platform and event ingestion service -Event Hubs is a modern big data streaming platform and event ingestion service that can seamlessly integrate with other Azure and Microsoft services, such as Stream Analytics, Power BI, and Event Grid, along with outside services like Apache Spark. The service can process millions of events per second with low latency. The data sent to an event hub (Event Hubs instance) can be transformed and stored by using any real-time analytics providers or batching or storage adapters. +# Azure Event Hubs ΓÇô A real-time data streaming platform with native Apache Kafka support +Azure Event Hubs is a cloud native data streaming service that can stream millions of events per second, with low latency, from any source to any destination. Event Hubs is compatible with Apache Kafka, and it enables you to run existing Kafka workloads without any code changes. -## Why use Event Hubs? -Data is valuable only when there's an easy way to process and get timely insights from data sources. Event Hubs provides a distributed stream processing platform with low latency and seamless integration, with data and analytics services inside and outside Azure to build your complete big data pipeline. +Using Event Hubs to ingest and store streaming data, businesses can harness the power of streaming data to gain valuable insights, drive real-time analytics, and respond to events as they happen, enhancing overall efficiency and customer experience. + :::image type="content" source="./media/event-hubs-about/event-streaming-platform.png" alt-text="Diagram that shows how Azure Event Hubs fits in an event streaming platform."::: -Event Hubs represents the "front door" for an event pipeline, often called an event ingestor in solution architectures. An event ingestor is a component or service that sits between event publishers and event consumers to decouple the production of events from the consumption of those events. Event Hubs provides a unified streaming platform with time retention buffer, decoupling event producers from event consumers. +Azure Event Hubs is the preferred event ingestion layer of any event streaming solution that you build on top of Azure. It seamlessly integrates with data and analytics services inside and outside Azure to build your complete data streaming pipeline to serve following use cases. -The following sections describe key features of the Azure Event Hubs service: +- [Real-time analytics with Azure Stream Analytics](./process-data-azure-stream-analytics.md) to generate real-time insights from streaming data. +- Analyze and explore streaming data with Azure Data Explorer. +- Create your own cloud native applications, functions, or microservices that run on streaming data from Event Hubs. +- Stream events with schema validation using a built-in schema registry to ensure quality and compatibility of streaming data. -## Fully managed PaaS -Event Hubs is a fully managed Platform-as-a-Service (PaaS) with little configuration or management overhead, so you focus on your business solutions. [Event Hubs for Apache Kafka ecosystems](azure-event-hubs-kafka-overview.md) gives you the PaaS Kafka experience without having to manage, configure, or run your clusters. -## Event Hubs for Apache Kafka -Azure Event Hubs for Apache Kafka ecosystems enables [Apache Kafka (1.0 and later)](https://kafka.apache.org/) clients and applications to talk to Event Hubs. You don't need to set up, configure, and manage your own Kafka and Zookeeper clusters or use some Kafka-as-a-Service offering not native to Azure. For more information, see [Event Hubs for Apache Kafka ecosystems](azure-event-hubs-kafka-overview.md). +## Key capabilities? +### Apache Kafka on Azure Event Hubs +Azure Event Hubs is a multi-protocol event streaming engine that natively supports AMQP, Apache Kafka and HTTPs protocols. Since it supports Apache Kafka, you bring Kafka workloads to Azure Event Hubs without doing any code change. You don't need to set up, configure, and manage your own Kafka clusters or use some Kafka-as-a-Service offering not native to Azure. -## Schema Registry in Azure Event Hubs -Schema Registry in Event Hubs provides a centralized repository for managing schemas of events streaming applications. Azure Schema Registry comes free with every Event Hubs namespace, and it integrates seamlessly with your Kafka applications or Event Hubs SDK based applications. +Event Hubs is built from the ground up as a cloud native broker engine. Hence you can run Kafka workloads with better performance, better cost efficiency and with no operational overhead. -It ensures data compatibility and consistency across event producers and consumers, enabling seamless schema evolution, validation, and governance, and promoting efficient data exchange and interoperability. For more information, see [Schema Registry in Azure Event Hubs](schema-registry-overview.md). +### Schema Registry in Azure Event Hubs +Azure Schema Registry in Event Hubs provides a centralized repository for managing schemas of events streaming applications. Azure Schema Registry comes free with every Event Hubs namespace, and it integrates seamlessly with your Kafka applications or Event Hubs SDK based applications. -## Support for real-time and batch processing -Ingest, buffer, store, and process your stream in real time to get actionable insights. Event Hubs uses a [partitioned consumer model](event-hubs-scalability.md#partitions), enabling multiple applications to process the stream concurrently and letting you control the speed of processing. Azure Event Hubs also integrates with [Azure Functions](../azure-functions/index.yml) for a serverless architecture. -## Capture event data -Capture your data in near-real time in an [Azure Blob storage](https://azure.microsoft.com/services/storage/blobs/) or [Azure Data Lake Storage](https://azure.microsoft.com/services/data-lake-store/)ΓÇ»for long-term retention or micro-batch processing. You can achieve this behavior on the same stream you use for deriving real-time analytics. Setting up capture of event data is fast. There are no administrative costs to run it, and it scales automatically with Event HubsΓÇ»[throughput units](event-hubs-scalability.md#throughput-units) or [processing units](event-hubs-scalability.md#processing-units). Event Hubs enables you to focus on data processing rather than on data capture. For more information, see [Event Hubs Capture](event-hubs-capture-overview.md). -## Scalable -With Event Hubs, you can start with data streams in megabytes, and grow to gigabytes or terabytes. The [Autoinflate](event-hubs-auto-inflate.md) feature is one of the many options available to scale the number of throughput units or processing units to meet your usage needs. +It ensures data compatibility and consistency across event producers and consumers. Schema Registry enables seamless schema evolution, validation, and governance, and promoting efficient data exchange and interoperability. +Schema Registry seamlessly integrates with your existing Kafka applications and it supports multiple schema definitions formats including Avro and JSON Schemas. + +### Real-time event stream processing with Azure Stream Analytics +Event Hubs integrates seamlessly with Azure Stream Analytics to enable real-time stream processing. With the built-in no-code editor, you can effortlessly develop a Stream Analytics job using drag-and-drop functionality, without writing any code. + -## Rich ecosystem -With a broad ecosystem available for the industry-standard AMQP 1.0 protocol and SDKs available in various languages: [.NET](https://github.com/Azure/azure-sdk-for-net/), [Java](https://github.com/Azure/azure-sdk-for-java/), [Python](https://github.com/Azure/azure-sdk-for-python/), [JavaScript](https://github.com/Azure/azure-sdk-for-js/), you can easily start processing your streams from Event Hubs. All supported client languages provide low-level integration. The ecosystem also provides you with seamless integration with Azure services like Azure Stream Analytics and Azure Functions and thus enables you to build serverless architectures. +Alternatively, developers can use the SQL-based Stream Analytics query language to perform real-time stream processing and take advantage of a wide range of functions for analyzing streaming data. +### Exploring streaming data with Azure Data Explorer +Azure Data Explorer is a fully managed platform for big data analytics that delivers high performance and allows for the analysis of large volumes of data in near real time. By integrating Event Hubs with Azure Data Explorer, you can easily perform near real-time analytics and exploration of streaming data. -## Event Hubs premium and dedicated -Event Hubs premium caters to high-end streaming needs that require superior performance, better isolation with predictable latency, and minimal interference in a managed multitenant PaaS environment. On top of all the features of the standard offering, the premium tier offers several extra features such as [dynamic partition scale up](dynamically-add-partitions.md), extended retention, and [customer-managed-keys](configure-customer-managed-key.md). For more information, see [Event Hubs Premium](event-hubs-premium-overview.md). -Event Hubs dedicated tier offers single-tenant deployments for customers with the most demanding streaming needs. This single-tenant offering has a guaranteed 99.99% SLA and is available only on our dedicated pricing tier. An Event Hubs cluster can ingress millions of events per second with guaranteed capacity and subsecond latency. Namespaces and event hubs created within the dedicated cluster include all features of the premium offering and more. For more information, see [Event Hubs Dedicated](event-hubs-dedicated-overview.md). -For more information, see [comparison between Event Hubs tiers](event-hubs-quotas.md). +### Rich ecosystemΓÇô Azure functions, SDKs and Kafka ecosystem +Ingest, buffer, store, and process your stream in real time to get actionable insights. Event Hubs uses a partitioned consumer model, enabling multiple applications to process the stream concurrently and letting you control the speed of processing. Azure Event Hubs also integrates with Azure Functions for a serverless architecture. +With a broad ecosystem available for the industry-standard AMQP 1.0 protocol and SDKs available in various languages: .NET, Java, Python, JavaScript, you can easily start processing your streams from Event Hubs. All supported client languages provide low-level integration. -## Event Hubs on Azure Stack Hub -The Event Hubs service on Azure Stack Hub allows you to realize hybrid cloud scenarios. Streaming and event-based solutions are supported for both on-premises and Azure cloud processing. Whether your scenario is hybrid (connected), or disconnected, your solution can support processing of events/streams at large scale. Your scenario is only bound by the Event Hubs cluster size, which you can provision according to your needs. +The ecosystem also provides you with seamless integration Azure Functions, Azure Spring Apps, Kafka Connectors and other data analytics platforms and technologies such as Apache Spark and Apache Flink. -The Event Hubs editions (on Azure Stack Hub and on Azure) offer a high degree of feature parity. This parity means SDKs, samples, PowerShell, CLI, and portals offer a similar experience, with few differences. -For more information, see [Event Hubs on Azure Stack Hub overview](/azure-stack/user/event-hubs-overview). +### Flexible and cost-efficient event streaming +You can experience flexible and cost-efficient event streaming through Event Hubs' diverse selection of tiers ΓÇô including Standard, Premium, and Dedicated. These options cater to data streaming needs ranging from a few MB/s to several GB/s, allowing you to choose the perfect match for your requirements. -## Key architecture components -Event Hubs contains the following key components. +### Scalable +With Event Hubs, you can start with data streams in megabytes, and grow to gigabytes or terabytes. The [Autoinflate](event-hubs-auto-inflate.md) feature is one of the many options available to scale the number of throughput units or processing units to meet your usage needs. + +### Capture streaming data for long term retention and batch analytics +Capture your data in near-real time in an Azure Blob storage or Azure Data Lake Storage for long-term retention or micro-batch processing. You can achieve this behavior on the same stream you use for deriving real-time analytics. Setting up capture of event data is fast. -\| Component \| Description \| -\| \| -- \| -\| Event producers \| Any entity that sends data to an event hub. Event publishers can publish events using HTTPS or AMQP 1.0 or Apache Kafka (1.0 and higher). \| -\| Partitions \| Each consumer only reads a specific subset, or a partition, of the message stream. \| -\| Consumer groups \| A view (state, position, or offset) of an entire event hub. Consumer groups enable consuming applications to each have a separate view of the event stream. They read the stream independently at their own pace and with their own offsets. \| -\| Event receivers \| Any entity that reads event data from an event hub. All Event Hubs consumers connect via the AMQP 1.0 session. The Event Hubs service delivers events through a session as they become available. All Kafka consumers connect via the Kafka protocol 1.0 and later. \| -\| [Throughput units (standard tier)](event-hubs-scalability.md#throughput-units) or [processing units (premium tier)](event-hubs-scalability.md#processing-units) or [capacity units (dedicated)](event-hubs-dedicated-overview.md) \| Prepurchased units of capacity that control the throughput capacity of Event Hubs. \| +## How it works? +Event Hubs provides a unified event streaming platform with time retention buffer, decoupling event producers from event consumers. The producers and consumer applications can perform large scale data ingestion through multiple protocols. -The following figure shows the Event Hubs stream processing architecture: -![Event Hubs](./media/event-hubs-about/event_hubs_architecture.png) +The following figure shows the key components of Event Hubs architecture: +The key functional components of Event Hubs include: +- Event Hub/Kafka topic: In Event Hubs, you can organize events into event hubs or Kafka topic. It's an append only distributed log, which can comprise of one or more partitions. +- Partitions are used to scale an event hub. They are like lanes in a freeway. If you need more streaming throughput, you need to add more partitions. +- Producer applications can ingest data to an event hub using Event Hubs SDKs or any Kafka producer client. +- Consumer applications consume data by seeking through the event log and maintaining consumer offset. Consumers can be based on Kafka consumer clients or Event Hubs SDK as well. +- Consumer Group is a logical group of consumer instances that reads data from an event hub/Kafka topic. It enables multiple consumers to read the same streaming data in an event hub independently at their own pace and with their own offsets. +- Namespace is the management container for one or more event hubs or Kafka topics. The management tasks such as allocating streaming capacity, configuring network security, enabling Geo Disaster recovery etc. are handled at the namespace level. -> [!NOTE] -> For more information, see [Event Hubs features or components](event-hubs-features.md). ## Next steps -To get started using Event Hubs, see the Send and receive events tutorials: +To get started using Event Hubs, see the following quick start guides. +### Stream data using Event Hubs SDK (AMQP) +You can use any of the following samples to stream data to Event Hubs using SDKs. - [.NET Core](event-hubs-dotnet-standard-getstarted-send.md) - [Java](event-hubs-java-get-started-send.md) - [Spring](/azure/developer/java/spring-framework/configure-spring-cloud-stream-binder-java-app-azure-event-hub?toc=/azure/event-hubs/TOC.json) To get started using Event Hubs, see the Send and receive events tutorials: - [C](event-hubs-c-getstarted-send.md) (send only) - [Apache Storm](event-hubs-storm-getstarted-receive.md) (receive only) +### Stream data using Apache Kafka +You can use following samples to stream data from your Kafka applications to Event Hubs. +- [Using Event Hubs with Kafka applications](event-hubs-java-get-started-send.md) + +### Schema validation with Schema Registry +You can use Event Hubs Schema Registry to perform schema validation for your event streaming applications. -To learn more about Event Hubs, see the following articles: +- [Schema validation for Kafka applications](schema-registry-kafka-java-send-receive-quickstart.md) -- [Event Hubs features overview](event-hubs-features.md)-- [Frequently asked questions](event-hubs-faq.yml).
hdinsight-aks	Cluster Storage	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/cluster-storage.md	+ + Title: Introduction to cluster storage +description: Understand how Azure HDInsight on AKS integrates with Azure Storage ++ Last updated : 08/3/2023++ +# Introduction to cluster storage ++ +Azure HDInsight on AKS can seamlessly integrate with Azure Storage, which is a general-purpose storage solution that works well with many other Azure services. +Azure Data Lake Storage Gen2 (ADLS Gen 2) is the default file system for the clusters. + +The storage account could be used as the default location for data, cluster logs, and other output that are generated during cluster operation. It could also be a default storage for the Hive catalog that depends on the cluster type. + +For more information, see [Introduction to Azure Data Lake Storage Gen2](/azure/storage/blobs/create-data-lake-storage-account). + +## Managed identities for secure file access + +Azure HDInsight on AKS uses managed identities (MSI) to secure cluster access to files in Azure Data Lake Storage Gen2. Managed identity is a feature of Azure Active Directory that provides Azure services with a set of automatically managed credentials. These credentials can be used to authenticate to any service that supports Active Directory authentication. Moreover, managed identities don't require you to store credentials in code or configuration files. + +In Azure HDInsight on AKS, once you select a managed identity and storage during cluster creation, the managed identity can seamlessly work with storage for data management, provided the Storage Blob Data Owner role is assigned to the user-assigned MSI. + +The following table outlines the supported storage options for Azure HDInsight on AKS (public preview): + +\|Cluster Type\|Supported Storage\|Connection\|Role on Storage\| +\|\|\|\|\| +\|Trino, Apache Flink, and Apache Spark \|ADLS Gen2\|Cluster user-assigned managed identity (MSI) \| The user-assigned MSI needs to have Storage Blob Data Owner role on the storage account.\| + +> [!NOTE] +> To share a storage account across multiple clusters, you can just assign the corresponding cluster user-assigned MSI ΓÇ£Storage Blob Data OwnerΓÇ¥ on the shared storage account. Learn how to [assign a role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page). + +After that, you can use the full storage `abfs://` path to access the data via your applications. + +For more information, see [Managed identities for Azure resources](/azure/active-directory/managed-identities-azure-resources/overview). +<br>Learn how to [create an ADLS Gen2 account](/azure/storage/blobs/create-data-lake-storage-account). + +## Azure HDInsight on AKS storage architecture + +The following diagram provides an abstract view of the Azure HDInsight on AKS architecture of Azure Storage. ++ +### Storage management + +Currently, Azure HDInsight on AKS doesn't support storage accounts with soft delete enabled, make sure you disable soft delete for your storage account. +
hdinsight-aks	Concept Azure Monitor Integration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/concept-azure-monitor-integration.md	+ + Title: Metrics and monitoring in HDInsight on AKS +description: Learn about how HDInsight on AKS interacts with Azure Monitoring. ++ Last updated : 08/29/2023++ +# Azure Monitor integration ++ +HDInsight on AKS offers an integration with Azure Monitor that can be used to monitor cluster pools and their clusters. + +Azure Monitor collects metrics and logs from multiple resources into an Azure Monitor Log Analytics workspace, which presents the data as structured, queryable tables that can be used to configure custom alerts. Azure Monitor logs provide an excellent overall experience for monitoring workloads and interacting with logs, especially if you have multiple clusters. + +Setting up Azure Monitor alerts is easy and beneficial. These alerts are triggered when the value of a metric or the results of a query meet certain conditions. A severity level for the alert can be added in addition to the name. The ability to specify severity is a powerful tool that can be used when creating multiple alerts. You can learn more about this topic and how to set up alerts [here](/azure/azure-monitor/alerts/alerts-log). + +The integration offers various other capabilities such as a flexible canvas for data analysis and the creation of rich visual reports using [Workbooks](/azure/azure-monitor/visualize/workbooks-overview) and as well cross cluster monitoring. + +Azure HDInsight on AKS comes with integrated monitoring experience with Azure services like Azure managed Prometheus along with Azure managed Grafana dashboards for monitoring. + +- Azure Managed Prometheus is a service that monitors your cloud environments. The monitoring is to maintain their availability, performance, and cluster metrics. It collects data generated by resources in your Azure instances and from other monitoring tools. The data is used to provide analysis across multiple sources. +- Azure Managed Grafana is a data visualization platform built on top of the Grafana software by Grafana Labs. It's built as a fully managed Azure service operated and supported by Microsoft. Grafana helps you bring together metrics, logs, and traces into a single user interface. With its extensive support for data sources and graphing capabilities, you can view and analyze your application and infrastructure telemetry data in real-time. + +HDInsight on AKS also offers an out-of-box monitoring feature that provides premade dashboards based on cluster and service health information on top of the Azure monitor integration for more flexibility and a better visualization experience. + +For more information +- [How to enable log analytics](how-to-azure-monitor-integration.md). +- [Using Azure managed Prometheus & Grafana](monitor-with-prometheus-grafana.md)
hdinsight-aks	Concept Security	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/concept-security.md	+ + Title: Security in HDInsight on AKS +description: An introduction to security with managed identity from Azure Active Directory in HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Overview of enterprise security in Azure HDInsight on AKS ++ +Azure HDInsight on AKS offers is secure by default, and there are several methods to address your enterprise security needs. Most of these solutions are activated by default. + +This article covers overall security architecture, and security solutions by dividing them into four traditional security pillars: perimeter security, authentication, authorization, and encryption. + +## Security architecture + +Enterprise readiness for any software requires stringent security checks to prevent and address threats that may arise. HDInsight on AKS provides a multi-layered security model to protect you on multiple layers. The security architecture uses modern authorization methods using MSI. All the storage access is through MSI, and the database access is through username/password. The password is stored in Azure [Key Vault](../key-vault/general/basic-concepts.md), defined by the customer. This makes the setup robust and secure by default. + +The below diagram illustrates a high-level technical architecture of security in HDInsight on AKS. ++ +## Enterprise security pillars + +One way of looking at enterprise security is to divide security solutions into four main groups based on the type of control. These groups are also called security pillars and are of the following types: perimeter security, authentication, authorization, and encryption. + +### Perimeter security + +Perimeter security in HDInsight on AKS is achieved through [virtual networks.](../hdinsight/hdinsight-plan-virtual-network-deployment.md) An enterprise admin can create a cluster inside a virtual network (VNET) and use [network security groups (NSG)](./secure-traffic-by-nsg.md) to restrict access to the virtual network. + +### Authentication + +HDInsight on AKS provides Azure Active Directory-based authentication for cluster login and uses managed identities (MSI) to secure cluster access to files in Azure Data Lake Storage Gen2. Managed identity is a feature of Azure Active Directory that provides Azure services with a set of automatically managed credentials. With this setup, enterprise employees can sign into the cluster nodes by using their domain credentials. +A managed identity from Azure Active Directory (Azure AD) allows your app to easily access other Azure AD-protected resources such as Azure Key Vault, Storage, SQL Server, and Database. The identity managed by the Azure platform and doesn't require you to provision or rotate any secrets. +This solution is a key for securing access to your HDInsight on AKS cluster and other dependent resources. Managed identities make your app more secure by eliminating secrets from your app, such as credentials in the connection strings. + +You create a user-assigned managed identity, which is a standalone Azure resource, as part of the cluster creation process, which manages the access to your dependent resources. + +### Authorization + +A best practice most enterprises follow is making sure that not every employee has full access to all enterprise resources. Likewise, the admin can define role-based access control policies for the cluster resources. + +The resource owners can configure role-based access control (RBAC). Configuring RBAC policies allows you to associate permissions with a role in the organization. This layer of abstraction makes it easier to ensure people have only the permissions needed to perform their work responsibilities. +Authorization managed by ARM roles for cluster management (control plane) and cluster data access (data plane) managed by [cluster access management](./hdinsight-on-aks-manage-authorization-profile.md). +#### Cluster management roles (Control Plane / ARM Roles) + +\|Action \|HDInsight on AKS Cluster Pool Admin \| HDInsight on AKS Cluster Admin\| +\|-\|-\|-\| +\|Create / Delete cluster pool \|Γ£à \| \| +\|Assign permission and roles on the cluster pool \|Γ£à\| \| +\|Create/delete cluster \|Γ£à\| Γ£à \| +\| Manage Cluster\| \| Γ£à \| +\| Configuration Management \| \|Γ£à\| +\| Script actions \| \|Γ£à\| +\| Library Management \| \|Γ£à\| +\| Monitoring \| \|Γ£à\| +\| Scaling actions \| \|Γ£à\| + +The above roles are from the ARM operations perspective. For more information, see [Grant a user access to Azure resources using the Azure portal - Azure RBAC](../role-based-access-control/quickstart-assign-role-user-portal.md). + +#### Cluster access (Data Plane) + +You can allow users, service principals, managed identity to access the cluster through portal or using ARM. + +This access enables you to + +* View clusters and manage jobs. +* Perform all the monitoring and management operations. +* Perform auto scale operations and update the node count. + +The access won't be provided for +* Cluster deletion ++ +> [!Important] +> Any newly added user will require additional role of ΓÇ£Azure Kubernetes Service RBAC ReaderΓÇ¥ for viewing the [service health](./service-health.md). + +## Auditing + +Auditing cluster resource access is necessary to track unauthorized or unintentional access of the resources. It's as important as protecting the cluster resources from unauthorized access. + +The resource group admin can view and report all access to the HDInsight on AKS cluster resources and data using activity log. The admin can view and report changes to the access control policies. + +## Encryption + +Protecting data is important for meeting organizational security and compliance requirements. Along with restricting access to data from unauthorized employees, you should encrypt it. The storage and the disks (OS disk and persistent data disk) used by the cluster nodes and containers are encrypted. Data in Azure Storage is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and is FIPS 140-2 compliant. Azure Storage encryption is enabled for all storage accounts, which makes data secure by default, you don't need to modify your code or applications to take advantage of Azure Storage encryption. Encryption of data in transit is handled with TLS 1.2. + +## Compliance + +Azure compliance offerings are based on various types of assurances, including formal certifications. Also, attestations, validations, and authorizations. Assessments produced by independent third-party auditing firms. +Contractual amendments, self-assessments, and customer guidance documents produced by Microsoft. For HDInsight on AKS compliance information, see the Microsoft [Trust Center](https://www.microsoft.com/trust-center?rtc=1) and the [Overview of Microsoft Azure compliance](/samples/browse/). + +## Shared responsibility model + +The following image summarizes the major system security areas and the security solutions that are available to you. It also highlights which security areas are your responsibilities as a customer and areas that are responsibility of HDInsight on AKS as the service provider. ++ +The following table provides links to resources for each type of security solution. + +\|Security area \|Solutions available \|Responsible party\| +\|-\|-\|-\| +\|Data Access Security \|[Configure access control lists ACLs](../storage/blobs/data-lake-storage-access-control.md) for Azure Data Lake Storage Gen2 \|Customer\| +\| \|Enable the [Secure transfer required](../storage/common/storage-require-secure-transfer.md) property on storage\|Customer\| +\| \|Configure [Azure Storage firewalls](../storage/common/storage-network-security.md) and virtual networks\|Customer\| +\|Operating system security\|Create clusters with most recent HDInsight on AKS versions\|Customer\| +\|Network security\| Configure a [virtual network](../hdinsight/hdinsight-plan-virtual-network-deployment.md)\|\| +\| \| Configure [Traffic using Firewall rules](./secure-traffic-by-firewall.md)\|Customer\| +\| \| Configure [Outbound traffic required](./required-outbound-traffic.md) \|Customer\|
hdinsight-aks	Create Cluster Error Dictionary	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/create-cluster-error-dictionary.md	+ + Title: Create a cluster - error dictionary in Azure HDInsight on AKS +description: Learn how to troubleshoot errors that occur when creating Azure HDInsight on AKS clusters ++ Last updated : 08/31/2023++ +# Cluster creation errors on Azure HDInsight on AKS + +This article describes how to troubleshoot and resolve errors that could occur when you create Azure HDInsight on AKS clusters. + +\|Sr. No\|Error message\|Cause\|Resolution\| +\|-\|--\|--\|--\| +\|1\|InternalServerError UnrecognizableError\|This error could indicate an incorrect template used. Currently, database connectors are allowed only through ARM template. Hence the validation of configuration isn't possible on the template.\| \| +\|2\|InvalidClusterSpec - ServiceDependencyFailure - Invalid configuration\|Max memory per node error.\|Refer to the maximum memory configurations [property value types](https://trino.io/docs/current/admin/properties-resource-management.html#query-max-memory-per-node).\| +\|3\|WaitingClusterResourcesReadyTimeOut - Metastoreservice unready\|This error could be due to the container name may only contain lowercase letters, numbers, and hyphens. Container name must begin with a letter or a number.\|Each hyphen must be preceded by and follow by a nonhyphen character. The name must also be between 3 and 63 characters long.\| +\|4\|InvalidClusterSpec -Invalid configuration - ClusterUpsertActivity\|Error: Invalid configuration property `hive.metastore.uri: may not be null`.\|[Refer to the Hive connector documentation](https://trino.io/docs/current/connector/hive.html#connector-hive--page-root).\| +\|5\|`InternalServerError - An exception has been raised that is likely due to a transient failure. Consider enabling transient error resiliency by adding 'EnableRetryOnFailure()' to the 'UseSqlServer' call`.\|\|Retry the operation or open a support ticket to Azure HDInsight team.\| +\|6\|`InternalServerError - ObjectDisposedException` occurs in RP code.\|\|Retry the operation or open a support ticket to Azure HDInsight team.\| +\|7\|`PreconditionFailed `- Operation failure due to quota limits on user subscription.\|There's quota validation before cluster creation. But when several clusters are created under the same subscription at the same time, it's possible that the first cluster occupies the quota and the other fails because of quota shortage.\|Confirm there's enough quota and retry cluster/cluster pool creation.\| +\|8\|`ReconcileApplicationSecurityGroupError` - Internal AKS error\|\|Retry the operation or open a support ticket to Azure HDInsight team.\| +\|9\|`ResourceGroupBeingDeleted`\|During HDI on AKS resource creation or update, user is also deleting some resources in related resource groups.\|Don't delete resources in HDI related resource groups when HDI on AKS resources are being created or updated.\| +\|10\|`UpsertNodePoolTimeOut - Async operation dependentArmResourceTask has timed out`.\|AKS issue ΓÇô could be due to high traffic in a particular region at the time of the operation.\|Retry the operation after some time. If possible, use another region.\| +\|11\|`Authorization_IdentityNotFound - {"code":null,"message":"The identity of the calling application could not be established."}`\|The 1-p service principle isn't on boarded to the tenant.\|Execute the command to provision the 1-p service principle on the new tenant to onboard.\| +\|12\|`NotFound - ARM/AKS sdk error`\|The user tries to update HDI on AKS cluster but the corresponding agent pool has been deleted.\|The corresponding agent pool has been deleted. It's not recommended to operate AKS agent pool directly.\| +\|13\|`AuthorizationFailed - Scope invalid role assignment issue with managed RG and cluster msi`\|Lack of permission to perform the operation.\|Check if the service principle app ID mentioned in the error message owned by you. If yes, grant the permission according to the error message. If no, open a support ticket to Azure HDInsight team.\| +\|14\|`DeleteAksClusterFailed - {"code":"DeleteAksClusterFailed","message":"An Azure service request has failed. ErrorCode: 'DeleteAksClusterFailed', ErrorMessage: 'Delete HDI cluster namespcae failed. Additional info: 'Can't access a disposed object.\\r\\nObject name: 'Microsoft.Azure.Common.Configuration.ManagedConfiguration was already disposed'.''."}`\|RP switched to a new role instance unexpectedly.\|retry the operation or open a support ticket to Azure HDInsight team.\| +\|15\|`EntityStoreOperationError - ARM/AKS sdk error`\|A database operation failed on AKS side during cluster update.\|Retry the operation after some time. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|16\|`InternalServerError - {"exception":"System.Threading.Tasks.TaskCanceledException","message":"The operation was canceled."}`\|This error caused due to various issues.\|retry the operation or open a support ticket to Azure HDInsight team.\| +\|17\|`InternalServerError - {"exception":"System.IO.IOException","message":"Unable to read data from the transport connection: A connection attempt failed because the connected party didn't properly respond after a period of time, or established connection failed because connected host has failed to respond."}`\|This error caused due to various issues.\|retry the operation after some time. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|18\|`InternalServerError - Null reference exception occurs in RP code`.\|This error caused due to various issues.\|Retry the operation or open a support ticket to Azure HDInsight team.\| +\|19\|`InternalServerError - {"code":"InternalServerError","message":"An internal error has occurred, exception: 'InvalidOperationException, Sequence contains no elements.'"}`\|This error caused due to various issues.\|retry the operation or open a support ticket to Azure HDInsight team.\| +\|20\|`InternalServerError - {"code":"InternalServerError","message":"An internal error has occurred, exception: 'ArgumentNullException, Value can't be null. (Parameter 'roleAssignmentGuid')'"}`\|This error caused due to various issues.\|retry the operation or open a support ticket to Azure HDInsight team.\| +\|21\|` OperationNotAllowed - {"code":"OperationNotAllowed","message":"An Azure service request has failed. ErrorCode: 'OperationNotAllowed', ErrorMessage: 'Service request failed.\\r\\nStatus: 409 (Conflict)\\r\\n\\r\\nContent:\\r\\n{\\ n \\"code\\": \\"OperationNotAllowed\\",\\ n \\"details\\": null,\\ n \\"message\\": \\"Operation isn't allowed: Another agent pool operation (Scaling) is in progress, wait for it to finish before starting a new operation.`\|Another agent pool operation (Scaling) is in progress. This error caused due to RP Service Fabric reboot.\|Wait for the previous operation to finish before starting a new operation. If the issue persists after retry, open a support ticket to Azure HDInsight team.\| +\|22\|`ReconcileVMSSAgentPoolFailed`\|There's quota validation before cluster creation. But when several clusters are created under the same subscription at the same time, it's possible that the first cluster occupies the quota and the others fail because of quota shortage.\|Confirm there's enough quota and retry cluster/cluster pool creation.\| +\|23\|`ReconcileVMSSAgentPoolFailed` - Unable to establish outbound connection from agents\|`AKS/VMSS` side issue: VM has reported a failure.\|retry the operation after some time. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|24\|`InternalServerError - {"code":"InternalServerError","message":"An internal error has occurred, exception: 'SqlException'"}`\|This error caused due to a transient SQL connection issue.\|retry the operation after some time. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|25\|`NotLatestOperation - ARM/AKS SDK error`\|The operation can't proceed. Either the operation has been preempted by another one, or the information needed by the operation failed to be saved (or hasn't been saved yet).\|retry the operation after some time. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|26\|`ReconcileVMSSAgentPoolFailed - Agent pool drain failed`\|There was an issue with the scaling down operation.\|Open a support ticket to Azure HDInsight team.\| +\|27\|`ResourceNotFound - ARM/AKS SDK error`\|This error issue occurs when a required resource removed/deleted by the user.\|Make sure the resource that is mentioned in the error message exists, then retry the operation. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|28\|`InvalidClusterSpec - The cluster instance deployment failed with reason 'System.DependencyFailure' and message 'Metastoreservice instance _'xyz'_ has invalid request due to - [Hive metastore storage location access check timed out.]`.\|The HMS initialization might time out due to SQL server or storage related issues.\|Open a support ticket to Azure HDInsight team.\| +\|29\|`InvalidClusterSpec - The cluster instance deployment failed with reason 'System.DependencyFailure' and message 'Metastoreservice instance '_xyz_' has invalid request due to - [Keyvault secrets weren't configured properly. Failed to fetch secrets from keyvault.]`.\|This error can occur due to `keyvault` being inaccessible or the secret key being not available. In some rare cases, this error might be due to slower initialization of pod identity infra on the cluster nodes.\|If you have Log Analytics enabled, check the logs of `secretprovider-validate job` to identify the reason.retry the operation after some time, if the issue persists, open a support ticket to Azure HDInsight team.\| +\|30\|`FlinkCluster unready - {"FlinkCluster": "Status can't be determined"}`\|This error can occur due to various reasons such as image pull issue, or controller pods not ready, or an issue with MSI.\|retry the operation after some time, if the issue persists, open a support ticket to Azure HDInsight team.\| +\|31\|`FlinkCluster unready - {"FlinkCluster": "StatefulSet instance 'flink-taskmanager' isn't ready due to - [Ready replicas don't match desired replica count]."}`\|This error can occur due to various reasons such as image pull issue, or controller pods not ready, or an issue with MSI.\|retry the operation after some time, if the issue persists, open a support ticket to Azure HDInsight team.\| +\|32\|`InvalidClusterSpec (class com.microsoft.azure.hdinsight.services.spark.exception.ClusterConfigException:[SparkClusterValidator#ConfigurationValidator#][ISSUE:(1)-Component config valid:[[{serviceName='yarn-service,componentName=hadoop-config-client}, {serviceName='yarn-service,componentName=hadoop-config}]],current:[[{serviceName='yarn-service,componentName=yarn-config}'`.\|This error can occur if the service config consists of components that aren't allowed.\|Validate the service config components and retry. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|33\|`InvalidClusterSpec -1,"conditions":[{"type":"RequestIsValid","status":"UNKNOWN","reason":"UNKNOWN","message":"Unable to determine status of one or more dependencies`.\|This error can occur due to HMS,SPARK,YARN services not being up, this error could be related to storage.\|Open a support ticket to Azure HDInsight team.\| +\|34\|`WaitingClusterResourcesReadyTimeOut - Failed to reconcile from generation 1 to 1.`\|\|Open a support ticket to Azure HDInsight team.\| +\|35\|`WaitingClusterResourcesReadyTimeOut - {"YarnService":"StatefulSet instance 'resourcemanager' isn't ready due to - `` see service status for specific details and how to fix it. Failing services are: YarnService, SparkService"}`\|This error can occur due to HMS,SPARK,YARN services not being up, this error could be related to storage.\|Open a support ticket to Azure HDInsight team.\| +\|36\|`InvalidClusterSpec - [spec.configs[0].files[3].fileName: Invalid value: "yarn-env.sh": spec.configs[0].files[3].fileName in body should match '(^yarn-site\\.xml$)\|(^capacity-scheduler\\.xml$)\|(^core-site\\.xml$)\|(^mapred-site\\.xml$)', spec.configs[0].files[3].values: Required value, spec.configs[1].files[2].fileName: Invalid value: "yarn-env.sh": spec.configs[1].files[2].fileName in body should match '(^yarn-site\\.xml$)\|(^capacity-scheduler\\.xml$)\|(^core-site\\.xml$)\|(^mapred-site\\.xml$)', spec.configs[1].files[2].values: Required value]`.\|This error can occur when unsupported files are passed in services configuration.\|Validate the service config components and retry. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|37\|`InvalidClusterSpec - ".AccessDeniedException: Operation failed: "Server failed to authenticate the request. InvalidAuthenticationInfo, "Server failed to authenticate the request.."`\|Invalid authentication parameters ΓÇô the storage location is inaccessible.\|Correct authentication parameters and retry. If the issue persists, open a support ticket to Azure HDInsight team.\| +\|38\|`InvalidClusterSpec - ΓÇ£_xyz_.dfs.core.windows.net isn't accessible. Reason: HTTP Error -1; url=. AzureADAuthenticator.getTokenCall threw java.net.SocketTimeoutException :. AzureADAuthenticator.getTokenCall threw java.net.SocketTimeoutException : Read timed out.]`.\|This error can occur when the pod identity resources take too long to start on the node when HMS pod is scheduled.\|retry the operation, if the issue persists, open a support ticket to Azure HDInsight team.\| + +## Next steps +* [Troubleshoot cluster configuration](./trino/trino-configuration-troubleshoot.md).
hdinsight-aks	Create Cluster Using Arm Template Script	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/create-cluster-using-arm-template-script.md	+ + Title: Export ARM template in Azure HDInsight on AKS +description: How to create an ARM template to cluster using script in Azure HDInsight on AKS ++ Last updated : 08/29/2023++ +# Export cluster ARM template using script ++ +This article describes how to generate an ARM template for your cluster automatically using a script. You can use the ARM template to modify, clone, or recreate a cluster starting from the existing cluster's configurations. + +## Prerequisites + +* An operational HDInsight on AKS cluster. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +## Steps to generate ARM template for the cluster + +1. Sign in to [Azure portal](https://portal.azure.com). + +2. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/create-cluster-using-arm-template-script/cloud-portal-search.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster." border="true" lightbox="./media/create-cluster-using-arm-template-script/cloud-portal-search.png"::: + +6. Select your cluster name from the list page. + + :::image type="content" source="./media/create-cluster-using-arm-template-script/cloud-portal-list-view.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list." border="true" lightbox="./media/create-cluster-using-arm-template-script/cloud-portal-list-view.png"::: + +2. Navigate to the overview blade of your cluster and click on JSON View at the top right. + + :::image type="content" source="./media/create-cluster-using-arm-template-script/view-cost-json-view.png" alt-text="Screenshot showing how to view cost and JSON View buttons from the Azure portal." border="true" lightbox="./media/create-cluster-using-arm-template-script/view-cost-json-view.png"::: + +2. Copy the "Resource JSON" and save it to a local JSON file. For example, `template.json`. + +3. Click the following button at the top right in the Azure portal to launch Azure Cloud Shell. + + :::image type="content" source="./media/create-cluster-using-arm-template-script/cloud-shell.png" alt-text="Screenshot screenshot showing Cloud Shell icon."::: + +5. Make sure Cloud Shell is set to "Bash" on the top left and upload your `template.json` file. + + :::image type="content" source="./media/create-cluster-using-arm-template-script/azure-cloud-shell-template-upload.png" alt-text="Screenshot showing how to upload your template.json file." border="true" lightbox="./media/create-cluster-using-arm-template-script/azure-cloud-shell-template-upload.png"::: + +2. Execute the following command to generate the ARM template. + + ```azurecli + wget https://hdionaksresources.blob.core.windows.net/common/arm_transform.py + + python arm_transform.py template.json + ``` + + :::image type="content" source="./media/create-cluster-using-arm-template-script/azure-cloud-shell-script-output.png" alt-text="Screenshot showing results after running the script." border="true" lightbox="./media/create-cluster-using-arm-template-script/azure-cloud-shell-script-output.png"::: + +This script creates an ARM template with name `template-modified.json` for your cluster and generates a command to deploy the ARM template. + +Now, your cluster ARM template is ready. You can update the properties of the cluster and finally deploy the ARM template to refresh the resources. To redeploy, you can either use the Azure CLI command output by the script or [deploy an ARM template using Azure portal](/azure/azure-resource-manager/templates/deploy-portal#deploy-resources-from-custom-template). + +> [!IMPORTANT] +> If you're cloning the cluster or creating a new cluster, you'll need to modify the `name`, `location`, and `fqdn` (the fqdn must match the cluster name).
hdinsight-aks	Create Cluster Using Arm Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/create-cluster-using-arm-template.md	+ + Title: Export cluster ARM template +description: Learn how to Create cluster ARM template ++ Last updated : 08/29/2023++ +# Export cluster ARM template ++ +This article describes how to generate an ARM template from resource JSON of your cluster. + +## Prerequisites + +* An operational HDInsight on AKS cluster. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +## Steps to generate ARM template for the cluster + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/create-cluster-using-arm-template/portal-search.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster." border="true" lightbox="./media/create-cluster-using-arm-template/portal-search.png"::: + +1. Select your cluster name from the list page. + + :::image type="content" source="./media/create-cluster-using-arm-template/portal-search-result.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list." border="true" lightbox="./media/create-cluster-using-arm-template/portal-search-result.png"::: + +1. Go to the overview blade of your cluster and click on JSON View at the top right. + + :::image type="content" source="./media/create-cluster-using-arm-template/view-cost-json-view.png" alt-text="Screenshot showing how to view cost and JSON View buttons from the Azure portal." border="true" lightbox="./media/create-cluster-using-arm-template/view-cost-json-view.png"::: + +1. Copy the response to an editor. For example: Visual Studio Code. +1. Modify the response with the following changes to turn it into a valid ARM template. + + * Remove the following objects- + * `id`, `systemData` + * `deploymentId`, `provisioningState`, and `status` under properties object. + + * Change "name" value to `<your clusterpool name>/<your cluster name>`. + + :::image type="content" source="./media/create-cluster-using-arm-template/change-cluster-name.png" alt-text="Screenshot showing how to change cluster name."::: + + * Add "apiversion": "2023-06-01-preview" in the same section with name, location etc. + + :::image type="content" source="./media/create-cluster-using-arm-template/api-version.png" alt-text="Screenshot showing how to modify the API version."::: + + 1. Open [custom template](/azure/azure-resource-manager/templates/deploy-portal#deploy-resources-from-custom-template) from the Azure portal and select "Build your own template in the editor" option. + + 1. Copy the modified response to the ΓÇ£resourcesΓÇ¥ object in the ARM template format. For example: + + :::image type="content" source="./media/create-cluster-using-arm-template/modify-get-response.png" alt-text="Screenshot showing how to modify the get response." border="true" lightbox="./media/create-cluster-using-arm-template/modify-get-response.png"::: + +Now, your cluster ARM template is ready. You can update the properties of the cluster and finally deploy the ARM template to refresh the resources. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal).
hdinsight-aks	Customize Clusters	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/customize-clusters.md	+ + Title: Customize Azure HDInsight on AKS clusters +description: Add custom components to HDInsight on AKS clusters by using script actions. Script actions are Bash scripts that can be used to customize the cluster configuration. ++ Last updated : 08/29/2023++ +# Customize Azure HDInsight on AKS clusters using script actions + + +Azure HDInsight on AKS provides a configuration method calledΓÇ» Script ActionsΓÇ»that invoke custom scripts to customize the cluster. These scripts can be used to install more packages/jars and change configuration settings. The Script actions can be used only during cluster creation. Post cluster creation script actions are in the roadmap. Currently Script Actions are available only with Spark clusters. + +## Understand script actions + +A script action is Bash script that runs on the service components in an HDInsight on AKS cluster. + +The characteristics and features of script actions are as follows: + +- The Bash script URI (the location to access the file) has to be publicly accessible from the HDInsight on AKS resource provider and the cluster. +- The following are possible storage locations for scripts: + - An ADLS Gen2 + - An Azure Storage account (the storage has to be publicly accessible) + - The Bash script URI format for ADLS Gen2 is `abfs://<container>@<datalakestoreaccountname>.dfs.core.windows.net/path_to_file.sh` + - The Bash script URI format for Azure Storage is `wasb://<container>@<azurestorageaccountname>testwasbwithoutmsi.blob.core.windows.net/path_to_file.sh` +- Script actions can be restricted to run on only certain service component types. For example, Resource Manager, Node Manager, Livy, Jupyter, Zeppelin, Metastore etc. +- Script actions are persisted. + - Persisted script actions must have a unique name. + - Persisted scripts are used to customize the service components + - When the service components are scaled up, the persisted script action is applied to them as well +- Script actions can accept parameters that required by the script, during execution. +- You're required to have permissions to create a cluster to execute script actions. + + > [!IMPORTANT] + > * Script actions that remove or modify service files on the nodes may impact service health and availability. You're required to apply discretion and check scripts before executing them. + > * There's no automatic way to undo the changes made by a script action. + +## Methods for using script actions + +You have the option of configuring a Script Action to run during cluster creation. + +> [!NOTE] +> Configuration of Script Actions on existing cluster is part of the roadmap. + +### Script action during the cluster creation process + +In HDInsight on AKS, the script is automatically persisted. A failure in the script can cause the cluster creation process to fail. + +The following diagram illustrates when script action runs during the creation process: + + +The script runs while HDInsight on AKS cluster is being provisioned. The script runs in parallel on all the specified nodes in the cluster. + +> [!IMPORTANT] +> * During cluster creation, you can use many script actions at once. +> * These scripts are invoked in the order in which they were specified, and not parallelly. + +### Next steps + +* How to [manage script actions](./manage-script-actions.md)
hdinsight-aks	Faq	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/faq.md	+ + Title: HDInsight on AKS FAQ +description: HDInsight on AKS frequently asked questions. ++ Last updated : 08/29/2023++ +# HDInsight on AKS - Frequently asked questions + +This article addresses some common questions about Azure HDInsight on AKS. + +## General + +* What is HDInsight on AKS? + + HDInsight on AKS is a new HDInsight version, which provides enterprise ready managed cluster service with emerging open-source analytics projects like Apache Flink (for Streaming), Trino (for adhoc analytics and BI), and Apache Spark. For more information, see [Overview](./overview.md). + +* What cluster shapes do HDInsight on AKS support? + + HDInsight on AKS supports Trino, Apache Flink, and Apache Spark to start with. However, other cluster shapes such as Kafka, Hive etc., are on roadmap. + +* How do I get started with HDInsight on AKS? + + To get started, visit Azure Marketplace and search for Azure HDInsight on AKS service and refer to [getting started](./quickstart-create-cluster.md). + +* What happens to existing HDInsight on VM and the clusters IΓÇÖm running today? + + There are no changes to existing HDInsight (HDInsight on VM). All your existing clusters continue to run, and you can continue to create and scale new HDInsight clusters. + +* Which operating system is supported with HDInsight on AKS? + + HDInsight on AKS is based on Mariner OS. For more information, see [OS Version](./release-notes/hdinsight-aks-release-notes.md#operating-system-version). + +* In what all Regions are HDInsight on AKS available? + + For a list of supported regions, refer to [Region availability](./overview.md#region-availability-public-preview). + +* WhatΓÇÖs the cost to deploy an HDInsight on AKS Cluster? + + For more information about pricing, see HDInsight on AKS pricing. + +## Cluster management + +* Can I run multiple clusters simultaneously? + + Yes, you can run as many clusters as you want per cluster pool simultaneously. However, make sure you aren't constraint by quota for your subscription. The maximum number of nodes allowed in a cluster pool are 250 (in public preview). + +* Can I install or add more plugins/libraries on my cluster? + + Yes, you can install custom plugins and libraries depending on the cluster shapes. + * For Trino, refer to [Install custom plugins](./trino/trino-custom-plugins.md). + * For Spark, refer to [Library management in Spark](./spark/library-management.md). + +* Can I SSH into my cluster? + + Yes, you can SSH onto your cluster via webssh and execute queries and submit jobs directly from there. + +## Metastore + +* Can I use an external metastore to connect to my cluster? + + Yes, you can use an external metastore. However, we support only Azure SQL Database as an external custom metastore. + +* Can I share a metastore across multiple clusters? + + Yes, you can share a metastore across multiple HDInsight of AKS. + +* What's the version of Hive metastore supported? + + Hive metastore version 3.1.2 + +## Workloads + +### Trino + +* What is Trino? + + Trino is an open source federated and distributed SQL query engine, which allows you to query data residing on different data sources without moving to a central data warehouse. + You can query the data using ANSI SQL, no need to learn a new language. For more information, see [Trino overview](./trino/trino-overview.md). + +* What all connectors do you support? + + HDInsight on AKS Trino supports multiple connectors. For more information, see this list of [Trino connectors](./trino/trino-connectors.md). + We keep on adding new connectors as and when new connectors are available in the open-source version. + +* Can I add catalogs to an existing cluster? + + Yes, you can add supported catalogs to the existing cluster. For more information, see [Add catalogs to an existing cluster](./trino/trino-add-catalogs.md). + +### Apache Flink + +* What is Apache Flink? + + Apache Flink is a best-in-class open-source analytic engine for stream processing and performing stateful computation over unbounded and bounded data streams. It can perform computations at in-memory speed and at any scale. + Flink on HDInsight on AKS offers managed open-source Apache Flink. For more information, see [Flink overview](./flink/flink-overview.md). + +* Do you support both session and app mode in Apache Flink? + + In HDInsight on AKS, Flink currently support session mode clusters. + +* What is state backend management and how it's done in HDInsight on AKS? + + Backends determine where state is stored. When checkpointing is activated, state is persisted upon checkpoints to guard against data loss and recover consistently. How the state is represented internally, and how and where it's persisted upon checkpoints depends on the chosen State Backend. For more information,see [Flink overview](./flink/flink-overview.md) + +### Apache Spark + +* What is Apache Spark? + + Apache Spark is a data processing framework that can quickly perform processing tasks on large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. + +* What language APIs are supported in Spark? + + Azure HDInsight on AKS supports Python and Scala. + +* Are external metastore supported in HDInsight on AKS Spark? + + HDInsight on AKS support external metastore connectivity. Currently only Azure SQL DB in supported as external metastore. + +* What are the various ways to submit jobs in HDInsight on AKS Spark? + + You can submit jobs on HDInsight on AKS Spark using Jupyter Notebook, Zeppelin Notebook, SDK and cluster terminal. For more information, see [Submit and Manage Jobs on a Spark cluster in HDInsight on AKS](./spark/submit-manage-jobs.md)
hdinsight-aks	Assign Kafka Topic Event Message To Azure Data Lake Storage Gen2	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/assign-kafka-topic-event-message-to-azure-data-lake-storage-gen2.md	+ + Title: Write event messages into Azure Data Lake Storage Gen2 with DataStream API +description: Learn how to write event messages into Azure Data Lake Storage Gen2 with DataStream API ++ Last updated : 08/29/2023++ +# Write event messages into Azure Data Lake Storage Gen2 with DataStream API ++ +Apache Flink uses file systems to consume and persistently store data, both for the results of applications and for fault tolerance and recovery. In this article, learn how to write event messages into Azure Data Lake Storage Gen2 with DataStream API. + +## Prerequisites + +* [HDInsight on AKS Apache Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* [HDInsight Kafka](../../hdinsight/kafk) + * You're required to ensure the network settings are taken care as described on [Using HDInsight Kafka](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS Flink and HDInsight Kafka are in the same Virtual Network +* Use MSI to access ADLS Gen2 +* IntelliJ for development on an Azure VM in HDInsight on AKS Virtual Network + +## Apache Flink FileSystem connector + +This filesystem connector provides the same guarantees for both BATCH and STREAMING and is designed to provide exactly once semantics for STREAMING execution. For more information, see [Flink DataStream Filesystem](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem) + +## Apache Kafka Connector + +Flink provides an Apache Kafka connector for reading data from and writing data to Kafka topics with exactly once guarantees. For more information, see [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka) + +## Build the project for Apache Flink + +pom.xml on IntelliJ IDEA + +``` xml +<properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <scala.binary.version>2.12</scala.binary.version> + <kafka.version>3.2.0</kafka.version> + </properties> + <dependencies> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> + </dependency> + </dependencies> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-assembly-plugin</artifactId> + <version>3.0.0</version> + <configuration> + <appendAssemblyId>false</appendAssemblyId> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> +``` + +Program for ADLS Gen2 Sink + +abfsGen2.java + +> [!Note] +> Replace [HDInsight Kafka](../../hdinsight/kafk)bootStrapServers with your own brokers for Kafka 2.4 or 3.2 + +``` java +package contoso.example; + +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.serialization.SimpleStringEncoder; +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.configuration.MemorySize; +import org.apache.flink.connector.file.sink.FileSink; +import org.apache.flink.connector.kafka.source.KafkaSource; +import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; +import org.apache.flink.core.fs.Path; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.DefaultRollingPolicy; + +import java.time.Duration; + +public class KafkaSinkToGen2 { + public static void main(String[] args) throws Exception { + // 1. get stream execution env + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + // 1. read kafka message as stream input, update your broker ip's + String brokers = "<update-broker-ip>:9092,<update-broker-ip>:9092,<update-broker-ip>:9092"; + KafkaSource<String> source = KafkaSource.<String>builder() + .setBootstrapServers(brokers) + .setTopics("click_events") + .setGroupId("my-group") + .setStartingOffsets(OffsetsInitializer.earliest()) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); + + DataStream<String> stream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source"); + stream.print(); + + // 3. sink to gen2, update container name and storage path + String outputPath = "abfs://<container-name>@<storage-path>.dfs.core.windows.net/flink/data/click_events"; + final FileSink<String> sink = FileSink + .forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8")) + .withRollingPolicy( + DefaultRollingPolicy.builder() + .withRolloverInterval(Duration.ofMinutes(2)) + .withInactivityInterval(Duration.ofMinutes(3)) + .withMaxPartSize(MemorySize.ofMebiBytes(5)) + .build()) + .build(); + + stream.sinkTo(sink); + + // 4. run stream + env.execute("Kafka Sink To Gen2"); + } +} + +``` + +Submit the job on Flink Dashboard UI + +We are using Maven to package a jar onto local and submitting to Flink, and using Kafka to sink into ADLS Gen2 ++ +Validate streaming data on ADLS Gen2 + +We are seeing the `click_events` streaming into ADLS Gen2 ++ +You can specify a rolling policy that rolls the in-progress part file on any of the following three conditions: + +``` java +.withRollingPolicy( + DefaultRollingPolicy.builder() + .withRolloverInterval(Duration.ofMinutes(5)) + .withInactivityInterval(Duration.ofMinutes(3)) + .withMaxPartSize(MemorySize.ofMebiBytes(5)) + .build()) +``` + +## Reference +- [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka) +- [Flink DataStream Filesystem](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem) +
hdinsight-aks	Azure Databricks	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/azure-databricks.md	+ + Title: Incorporate Flink DataStream into Azure Databricks Delta Lake Table +description: Learn about incorporate Flink DataStream into Azure Databricks Delta Lake Table in HDInsight on AKS - Apache Flink ++ Last updated : 10/05/2023++ +# Incorporate Flink DataStream into Azure Databricks Delta Lake Table + +This example shows how to sink stream data landed into Azure ADLS Gen2 from HDInsight Flink cluster on AKS applications into Delta Lake tables using Azure Databricks Auto Loader. + +## Prerequisites + +- [HDInsight Flink 1.16.0 on AKS](./flink-create-cluster-portal.md) +- [HDInsight Kafka 3.2.0](../../hdinsight/kafk) +- [Azure Databricks](/azure/databricks/getting-started/) in the same VNET as HDInsight on AKS +- [ADLS Gen2](/azure/databricks/getting-started/connect-to-azure-storage/) and Service Principal + +## Azure Databricks Auto Loader + +Databricks Auto Loader makes it easy to stream data land into object storage from Flink applications into Delta Lake tables. [Auto Loader](/azure/databricks/ingestion/auto-loader/) provides a Structured Streaming source called cloudFiles. + +Here are the steps how you can use data from Flink in Azure Databricks delta live tables. + +### Create Kafka table on Flink SQL + +In this step, you can create Kafka table and ADLS Gen2 on Flink SQL. For the purpose of this document, we are using a airplanes_state_real_time table, you can use any topic of your choice. + +You are required to update the broker IPs with your Kafka cluster in the code snippet. + +```SQL +CREATE TABLE kafka_airplanes_state_real_time ( + `date` STRING, + `geo_altitude` FLOAT, + `icao24` STRING, + `latitude` FLOAT, + `true_track` FLOAT, + `velocity` FLOAT, + `spi` BOOLEAN, + `origin_country` STRING, + `minute` STRING, + `squawk` STRING, + `sensors` STRING, + `hour` STRING, + `baro_altitude` FLOAT, + `time_position` BIGINT, + `last_contact` BIGINT, + `callsign` STRING, + `event_time` STRING, + `on_ground` BOOLEAN, + `category` STRING, + `vertical_rate` FLOAT, + `position_source` INT, + `current_time` STRING, + `longitude` FLOAT + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'airplanes_state_real_time', + 'scan.startup.mode' = 'latest-offset', + 'properties.bootstrap.servers' = '10.0.0.38:9092,10.0.0.39:9092,10.0.0.40:9092', + 'format' = 'json' +); +``` +Next, you can create ADLSgen2 table on Flink SQL. + +Update the container-name and storage-account-name in the code snippet with your ADLS Gen2 details. + +```SQL +CREATE TABLE adlsgen2_airplanes_state_real_time ( + `date` STRING, + `geo_altitude` FLOAT, + `icao24` STRING, + `latitude` FLOAT, + `true_track` FLOAT, + `velocity` FLOAT, + `spi` BOOLEAN, + `origin_country` STRING, + `minute` STRING, + `squawk` STRING, + `sensors` STRING, + `hour` STRING, + `baro_altitude` FLOAT, + `time_position` BIGINT, + `last_contact` BIGINT, + `callsign` STRING, + `event_time` STRING, + `on_ground` BOOLEAN, + `category` STRING, + `vertical_rate` FLOAT, + `position_source` INT, + `current_time` STRING, + `longitude` FLOAT + ) WITH ( + 'connector' = 'filesystem', + 'path' = 'abfs://<container-name>@<storage-account-name>/flink/airplanes_state_real_time/', + 'format' = 'json' + ); +``` + +Further, you can insert Kafka table into ADLSgen2 table on Flink SQL. ++ +### Validate the streaming job on Flink ++ +### Check data sink from Kafka in Azure Storage on Azure portal ++ +### Authentication of Azure Storage and Azure Databricks notebook + +ADLS Gen2 provides OAuth 2.0 with your Azure AD application service principal for authentication from an Azure Databricks notebook and then mount into Azure Databricks DBFS. + +Let's get service principle appid, tenant id and secret key. ++ +Grant service principle the Storage Blob Data Owner on Azure portal ++ +Mount ADLS Gen2 into DBFS, on Azure Databricks notebook ++ +Prepare notebook + +Let's write the following code: +```SQL +%sql +CREATE OR REFRESH STREAMING TABLE airplanes_state_real_time2 +AS SELECT * FROM cloud_files("dbfs:/mnt/contosoflinkgen2/flink/airplanes_state_real_time/", "json") +``` + +### Define Delta Live Table Pipeline and run on Azure Databricks +++ +### Check Delta Live Table on Azure Databricks Notebook +
hdinsight-aks	Azure Iot Hub	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/azure-iot-hub.md	+ + Title: Process real-time IoT data on Flink with Azure HDInsight on AKS +description: How to integrate Azure IoT Hub and Apache Flink ++ Last updated : 10/03/2023++ +# Process real-time IoT data on Flink with Azure HDInsight on AKS + +Azure IoT Hub is a managed service hosted in the cloud that acts as a central message hub for communication between an IoT application and its attached devices. You can connect millions of devices and their backend solutions reliably and securely. Almost any device can be connected to an IoT hub. + +## Prerequisites + +1. [Create an Azure IoTHub](/azure/iot-hub/iot-hub-create-through-portal/) +2. [Create an HDInsight on AKS Flink cluster](./flink-create-cluster-portal.md) + +## Configure Flink cluster + +Add ABFS storage account keys in your Flink cluster's configuration. + +Add the following configurations: + +`fs.azure.account.key.<your storage account's dfs endpoint> = <your storage account's shared access key>` ++ +## Writing the Flink job + +### Set up configuration for ABFS + +```java +Properties props = new Properties(); +props.put( + "fs.azure.account.key.<your storage account's dfs endpoint>", + "<your storage account's shared access key>" +); + +Configuration conf = ConfigurationUtils.createConfiguration(props); + +StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf); + +``` ++ +This set up is required for Flink to authenticate with your ABFS storage account to write data to it. + +### Defining the IoT Hub source + +IoTHub is build on top of event hub and hence supports a kafka-like API. So in our Flink job, we can define a `KafkaSource` with appropriate parameters to consume messages from IoTHub. + +```java +String connectionString = "<your iot hub connection string>"; + +KafkaSource<String> source = KafkaSource.<String>builder() + .setBootstrapServers("<your iot hub's service bus url>:9093") + .setTopics("<name of your iot hub>") + .setGroupId("$Default") + .setProperty("partition.discovery.interval.ms", "10000") + .setProperty("security.protocol", "SASL_SSL") + .setProperty("sasl.mechanism", "PLAIN") + .setProperty("sasl.jaas.config", String.format("org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"%s\";", connectionString)) + .setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST)) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); + +DataStream<String> kafka = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source"); +kafka.print(); +``` + +The connection string for IoT Hub can be found here - ++ +Within the connection string, you can find a service bus URL (URL of the underlying event hub namespace), which you need to add as a bootstrap server in your kafka source. In this case, it is: `iothub-ns-sagiri-iot-25146639-20dff4e426.servicebus.windows.net:9093` + +### Defining the ABFS sink + +```java +String outputPath = "abfs://<container name>@<your storage account's dfs endpoint>"; + +final FileSink<String> sink = FileSink + .forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8")) + .withRollingPolicy( + DefaultRollingPolicy.builder() + .withRolloverInterval(Duration.ofMinutes(2)) + .withInactivityInterval(Duration.ofMinutes(3)) + .withMaxPartSize(MemorySize.ofMebiBytes(5)) + .build()) + .build(); + +kafka.sinkTo(sink); +``` + +### Flink job code + +```java +package org.example; + +import java.time.Duration; +import java.util.Properties; +import org.apache.flink.api.common.serialization.SimpleStringEncoder; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.configuration.ConfigurationUtils; +import org.apache.flink.configuration.MemorySize; +import org.apache.flink.connector.file.sink.FileSink; +import org.apache.flink.core.fs.Path; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.connector.kafka.source.KafkaSource; +import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.DefaultRollingPolicy; +import org.apache.kafka.clients.consumer.OffsetResetStrategy; + +public class StreamingJob { + public static void main(String[] args) throws Throwable { + + Properties props = new Properties(); + props.put( + "fs.azure.account.key.<your storage account's dfs endpoint>", + "<your storage account's shared access key>" + ); + + Configuration conf = ConfigurationUtils.createConfiguration(props); + + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf); + + String connectionString = "<your iot hub connection string>"; + + + KafkaSource<String> source = KafkaSource.<String>builder() + .setBootstrapServers("<your iot hub's service bus url>:9093") + .setTopics("<name of your iot hub>") + .setGroupId("$Default") + .setProperty("partition.discovery.interval.ms", "10000") + .setProperty("security.protocol", "SASL_SSL") + .setProperty("sasl.mechanism", "PLAIN") + .setProperty("sasl.jaas.config", String.format("org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=\"%s\";", connectionString)) + .setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST)) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); ++ + DataStream<String> kafka = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source"); + kafka.print(); + + String outputPath = "abfs://<container name>@<your storage account's dfs endpoint>"; + + final FileSink<String> sink = FileSink + .forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8")) + .withRollingPolicy( + DefaultRollingPolicy.builder() + .withRolloverInterval(Duration.ofMinutes(2)) + .withInactivityInterval(Duration.ofMinutes(3)) + .withMaxPartSize(MemorySize.ofMebiBytes(5)) + .build()) + .build(); + + kafka.sinkTo(sink); + + env.execute("Azure-IoTHub-Flink-ABFS"); + } +} + +``` + +#### Maven dependencies + +```xml +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> +</dependency> +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> +</dependency> +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-scala_2.12</artifactId> + <version>${flink.version}</version> +</dependency> +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> +</dependency> +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> +</dependency> +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>${flink.version}</version> +</dependency> +``` ++ +### Submit job + +Submit job using HDInsight on AKS's [Flink job submission API](./flink-job-management.md) +
hdinsight-aks	Change Data Capture Connectors For Apache Flink	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/change-data-capture-connectors-for-apache-flink.md	+ + Title: How to perform Change Data Capture of SQL Server with DataStream API and DataStream Source. +description: Learn how to perform Change Data Capture of SQL Server with DataStream API and DataStream Source. ++ Last updated : 08/29/2023++ +# Change Data Capture of SQL Server with DataStream API and DataStream Source ++ +Change Data Capture (CDC) is a technique you can use to track row-level changes in database tables in response to create, update, and delete operations. In this article, we use [CDC Connectors for Apache Flink┬«](https://github.com/ververica/flink-cdc-connectors), which offer a set of source connectors for Apache Flink. The connectors integrate [Debezium┬«](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/debezium/#debezium-format) as the engine to capture the data changes. + +In this article, learn how to perform Change Data Capture of SQL Server using Datastream API. The SQLServer CDC connector can also be a DataStream source. + +## Prerequisites + +* [HDInsight on AKS Apache Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* [HDInsight Kafka](../../hdinsight/kafk) + * You're required to ensure the network settings are taken care as described on [Using HDInsight Kafka](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS Flink and HDInsight Kafka are in the same VNet +* Azure SQLServer +* HDInsight Kafka cluster and HDInsight on AKS Flink clusters are located in the same VNet +* Install [IntelliJ IDEA](https://www.jetbrains.com/idea/download/#section=windows) for development on an Azure VM, which locates in HDInsight VNet + +### SQLServer CDC Connector + +The SQLServer CDC connector is a Flink source connector, which reads database snapshot first and then continues to read change events with exactly once processing even failures happen. The SQLServer CDC connector can also be a DataStream source. + +### Single Thread Reading + +The SQLServer CDC source canΓÇÖt work in parallel reading, because there's only one task, which can receive change events. For more information, refer [SQLServer CDC Connector](https://ververica.github.io/flink-cdc-connectors/master/content/connectors/sqlserver-cdc.html). + +### DataStream Source + +The SQLServer CDC connector can also be a DataStream source. You can create a SourceFunction. + +## How the SQLServer CDC connector works? + +To optimize, configure and run a Debezium SQL Server connector. It's helpful to understand how the connector performs snapshots, streams change events, determines Kafka topic names, and uses metadata. + +- Snapshots : SQL Server CDC isn't designed to store a complete history of database changes. For the Debezium SQL Server connector, to establish a baseline for the current state of the database, it uses a process called snapshotting. ++ +## Apache Flink on HDInsight on AKS + +Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Apache Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. + +For more information, refer + +* [Apache Flink┬«ΓÇöStateful Computations over Data Streams](https://flink.apache.org/) +* [What is Apache Flink in HDInsight on AKS](./flink-overview.md) + +## Apache Kafka on HDInsight + +Apache Kafka is an open-source distributed streaming platform that can be used to build real-time streaming data pipelines and applications. Kafka also provides message broker functionality similar to a message queue, where you can publish and subscribe to named data streams. + +For more information, refer [Apache Kafka in Azure HDInsight](../../hdinsight/kafk) + +## Perform a test + +#### Prepare DB and table on Sqlserver + +``` +CREATE DATABASE inventory; +GO +``` +CDC is enabled on the SQL Server database + +``` +USE inventory; +EXEC sys.sp_cdc_enable_db; +GO +``` + +Verify that the user has access to the CDC table + +``` +USE inventory +GO +EXEC sys.sp_cdc_help_change_data_capture +GO +``` +> [!NOTE] +> The query returns configuration information for each table in the database that is enabled for CDC and that contains change data that the caller is authorized to access. If the result is empty, verify that the user has privileges to access both the capture instance and the CDC tables. + +Create and populate products with single insert with many rows + +``` +CREATE TABLE products ( +id INTEGER IDENTITY(101,1) NOT NULL PRIMARY KEY, +name VARCHAR(255) NOT NULL, +description VARCHAR(512), +weight FLOAT +); + +INSERT INTO products(name,description,weight) +VALUES ('scooter','Small 2-wheel scooter',3.14); +INSERT INTO products(name,description,weight) +VALUES ('car battery','12V car battery',8.1); +INSERT INTO products(name,description,weight) +VALUES ('12-pack drill bits','12-pack of drill bits with sizes ranging from #40 to #3',0.8); +INSERT INTO products(name,description,weight) +VALUES ('hammer','12oz carpenter''s hammer',0.75); +INSERT INTO products(name,description,weight) +VALUES ('hammer','14oz carpenter''s hammer',0.875); +INSERT INTO products(name,description,weight) +VALUES ('hammer','16oz carpenter''s hammer',1.0); +INSERT INTO products(name,description,weight) +VALUES ('rocks','box of assorted rocks',5.3); +INSERT INTO products(name,description,weight) +VALUES ('jacket','water resistent black wind breaker',0.1); +INSERT INTO products(name,description,weight) +VALUES ('spare tire','24 inch spare tire',22.2); + +EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'products', @role_name = NULL, @supports_net_changes = 0; + +-- Create some very simple orders +CREATE TABLE orders ( +id INTEGER IDENTITY(10001,1) NOT NULL PRIMARY KEY, +order_date DATE NOT NULL, +purchaser INTEGER NOT NULL, +quantity INTEGER NOT NULL, +product_id INTEGER NOT NULL, +FOREIGN KEY (product_id) REFERENCES products(id) +); + +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('16-JAN-2016', 1001, 1, 102); +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('17-JAN-2016', 1002, 2, 105); +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('19-FEB-2016', 1002, 2, 106); +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('21-FEB-2016', 1003, 1, 107); + +EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'orders', @role_name = NULL, @supports_net_changes = 0; +GO +``` +##### Maven source code on IdeaJ + +In the below snippet, we use HDInsight Kafka 2.4.1. Based on your usage, update the version of Kafka on `<kafka.version>`. + +maven pom.xml + +```xml +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <groupId>org.example</groupId> + <artifactId>FlinkDemo</artifactId> + <version>1.0-SNAPSHOT</version> + <properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <scala.binary.version>2.12</scala.binary.version> + <kafka.version>2.4.1</kafka.version> // Replace with 3.2 if you're using HDInsight Kafka 3.2 + </properties> + <dependencies> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-base --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-base</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-core</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-sql-connector-elasticsearch7 --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-sql-connector-elasticsearch7</artifactId> + <version>${flink.version}</version> + <scope>provided</scope> + </dependency> + <!-- https://mvnrepository.com/artifact/com.ververica/flink-sql-connector-sqlserver-cdc --> + <dependency> + <groupId>com.ververica</groupId> + <artifactId>flink-sql-connector-sqlserver-cdc</artifactId> + <version>2.2.1</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-common --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-common</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-planner --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-planner_2.12</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-api-scala --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-api-scala_2.12</artifactId> + <version>${flink.version}</version> + </dependency> + </dependencies> +</project> +``` + +mssqlSinkToKafka.java + +```java +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.connector.base.DeliveryGuarantee; +import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema; +import org.apache.flink.connector.kafka.sink.KafkaSink; + +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.api.functions.source.SourceFunction; + +import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema; +import com.ververica.cdc.connectors.sqlserver.SqlServerSource; + +public class mssqlSinkToKafka { + + public static void main(String[] args) throws Exception { + // 1: Stream execution environment, update the kafka brokers below. + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1); //use parallelism 1 for sink to keep message ordering + + String kafka_brokers = "wn0-sampleka:9092,wn1-sampleka:9092,wn2-sampleka:9092"; + + // 2. sql server source - Update your sql server name, username, password + SourceFunction<String> sourceFunction = SqlServerSource.<String>builder() + .hostname("<samplehilosqlsever>.database.windows.net") + .port(1433) + .database("inventory") // monitor sqlserver database + .tableList("dbo.orders") // monitor products table + .username("username") // username + .password("password") // password + .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String + .build(); + + DataStream<String> stream = env.addSource(sourceFunction); + stream.print(); + + // 3. sink order table transaction to kafka + KafkaSink<String> sink = KafkaSink.<String>builder() + .setBootstrapServers(kafka_brokers) + .setRecordSerializer(KafkaRecordSerializationSchema.builder() + .setTopic("mssql_order") + .setValueSerializationSchema(new SimpleStringSchema()) + .build() + ) + .setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) + .build(); + stream.sinkTo(sink); + + // 4. run stream + env.execute(); + } +} +``` + +### Validation + +- Insert four rows into table order on sqlserver, then check on Kafka + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/check-kafka-output.png" alt-text="Screenshot showing how to check Kafka output."::: + +- Insert more rows on sqlserver + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/insert-more-rows-on-sql-server.png" alt-text="Screenshot showing how to insert more rows on sqlserver."::: + +- Check changes on Kafka + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/check-changes-on-kafka.png" alt-text="Screenshot showing changes made in Kafka after inserting four rows."::: + +- Update `product_id=107` on sqlserver + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/update-product-id-107.png" alt-text="Screenshot showing update for product ID 107."::: + + - Check changes on Kafka for the updated ID 107 + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/check-changes-on-kafka-for-id-107.png" alt-text="Screenshot showing changes in Kafka for updated ID 107."::: + + - Delete `product_id=107` on sqlserver + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/delete-product-id-107-on-sql-server.png" alt-text="Screenshot showing how to delete product ID 107."::: + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/delete-product-id-107-output.png" alt-text="Screenshot showing deleted items on SQL Server."::: + + - Check changes on Kafka for the deleted `product_id=107` + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/check-changes-on-kafka-for-deleted-records.png" alt-text="Screenshot showing in Kafka for deleted items."::: + + - The following JSON message on Kafka shows the change event in JSON format. + + :::image type="content" source="./media/change-data-capture-connectors-for-apache-flink/json-output.png" alt-text="Screenshot showing JSON output."::: + +### Reference + +* [SQLServer CDC Connector](https://github.com/ververic) is licensed under [Apache 2.0 License](https://github.com/ververica/flink-cdc-connectors/blob/master/LICENSE) +* [Apache Kafka in Azure HDInsight](../../hdinsight/kafk) +* [Flink Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/#behind-the-scene)
hdinsight-aks	Cosmos Db For Apache Cassandra	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/cosmos-db-for-apache-cassandra.md	+ + Title: Using Azure Cosmos DB (Apache Cassandra) with HDInsight on AKS - Flink +description: Learn how to Sink HDInsight Kafka message into Azure Cosmos DB for Apache Cassandra, with Apache Flink running on HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Sink Kafka messages into Azure Cosmos DB for Apache Cassandra, with HDInsight on AKS - Flink ++ +This example uses [HDInsight on AKS Flink 1.16.0](../flink/flink-overview.md) to sink [HDInsight Kafka 3.2.0](/azure/hdinsight/kafka/apache-kafka-introduction) messages into [Azure Cosmos DB for Apache Cassandra](/azure/cosmos-db/cassandra/introduction) + +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* [HDInsight 5.1 Kafka 3.2](../../hdinsight/kafk) +* [Azure Cosmos DB for Apache Cassandra](../../cosmos-db/cassandra/index.yml) +* Prepare an Ubuntu VM as maven project development env in the same VNet as HDInsight on AKS. + +## Azure Cosmos DB for Apache Cassandra + +Azure Cosmos DB for Apache Cassandra can be used as the data store for apps written for Apache Cassandra. This compatibility means that by using existing Apache drivers compliant with CQLv4, your existing Cassandra application can now communicate with the API for Cassandra. + +For more information, see the following links. + +* [Azure Cosmos DB for Apache Cassandra](../../cosmos-db/cassandr). +* [Create a API for Cassandra account in Azure Cosmos DB](../../cosmos-db/cassandr). ++ +Get credentials uses it on Stream source code: ++ +## Implementation + +On an Ubuntu VM, let's prepare the development environment + +### Cloning repository of Azure Samples + +Refer GitHub readme to download maven, clone this repository using `Azure-Samples/azure-cosmos-db-cassandra-java-getting-started.git` from +[Azure Samples ](https://github.com/Azure-Samples/azure-cosmos-db-cassandra-java-getting-started) + +### Updating maven project for Cassandra + +Go to maven project folder azure-cosmos-db-cassandra-java-getting-started-main and update the changes required for this example + +maven pom.xml +``` xml + +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <groupId>com.azure.cosmosdb.cassandra</groupId> + <artifactId>cosmosdb-cassandra-examples</artifactId> + <version>1.0-SNAPSHOT</version> + <dependencies> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>1.16.0</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>1.16.0</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>1.16.0</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>1.16.0</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>1.16.0</version> + </dependency> + <dependency> + <groupId>com.datastax.cassandra</groupId> + <artifactId>cassandra-driver-core</artifactId> + <version>3.3.0</version> + </dependency> + <dependency> + <groupId>com.datastax.cassandra</groupId> + <artifactId>cassandra-driver-mapping</artifactId> + <version>3.1.4</version> + </dependency> + <dependency> + <groupId>com.datastax.cassandra</groupId> + <artifactId>cassandra-driver-extras</artifactId> + <version>3.1.4</version> + </dependency> + <dependency> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-api</artifactId> + <version>1.7.5</version> + </dependency> + <dependency> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-log4j12</artifactId> + <version>1.7.5</version> + </dependency> + </dependencies> + + <build> + <plugins> + <plugin> + <artifactId>maven-assembly-plugin</artifactId> + <configuration> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + <finalName>cosmosdb-cassandra-examples</finalName> + <appendAssemblyId>false</appendAssemblyId> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-compiler-plugin</artifactId> + <configuration> + <source>1.8</source> + <target>1.8</target> + </configuration> + </plugin> + </plugins> + </build> + +</project> + +``` +Cosmos DB for Apache Cassandra's connection configuration + +You're required to update your host-name and user-name, and keys in the below snippet. + +``` +root@flinkvm:/home/flinkvm/azure-cosmos-db-cassandra-java-getting-started-main/src/main/resources# cat config.properties +###Cassandra endpoint details on cosmosdb +cassandra_host=<update-host-name>.cassandra.cosmos.azure.com +cassandra_port = 10350 +cassandra_username=<update-user-name> +cassandra_password=mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +#ssl_keystore_file_path=<SSL key store file location> +#ssl_keystore_password=<SSL key store password> +``` + +source structure + +``` +root@flinkvm:/home/flinkvm/azure-cosmos-db-cassandra-java-getting-started-main/src/main/java/com/azure/cosmosdb/cassandra# ll +total 24 +drwxr-xr-x 5 root root 4096 May 12 12:46 ./ +drwxr-xr-x 3 root root 4096 Apr 9 2020 ../ +-rw-r--r-- 1 root root 1105 Apr 9 2020 User.java +drwxr-xr-x 2 root root 4096 May 15 03:53 examples/ +drwxr-xr-x 2 root root 4096 Apr 9 2020 repository/ +drwxr-xr-x 2 root root 4096 May 15 02:43 util/ +``` + +util folder +CassandraUtils.java + +> [!NOTE] +> Change ssl_keystore_file_path depends on the java cert location. On HDInsight on AKS Apache Flink, the path is `/usr/lib/jvm/msopenjdk-11-jre/lib/security` + +``` java +package com.azure.cosmosdb.cassandra.util; + +import com.datastax.driver.core.; + +import javax.net.ssl.; +import java.io.File; +import java.io.FileInputStream; +import java.io.InputStream; +import java.security.; + +/* + * Cassandra utility class to handle the Cassandra Sessions + / +public class CassandraUtils { + + private Cluster cluster; + private Configurations config = new Configurations(); + private String cassandraHost = "<cassandra-host-ip>"; + private int cassandraPort = 10350; + private String cassandraUsername = "localhost"; + private String cassandraPassword = "<cassandra-password>"; + private File sslKeyStoreFile = null; + private String sslKeyStorePassword = "<keystore-password>"; ++ + /* + * This method creates a Cassandra Session based on the the end-point details given in config.properties. + * This method validates the SSL certificate based on ssl_keystore_file_path & ssl_keystore_password properties. + * If ssl_keystore_file_path & ssl_keystore_password are not given then it uses 'cacerts' from JDK. + * @return Session Cassandra Session + / + public Session getSession() { + + try { + //Load cassandra endpoint details from config.properties + loadCassandraConnectionDetails(); + + final KeyStore keyStore = KeyStore.getInstance("JKS"); + try (final InputStream is = new FileInputStream(sslKeyStoreFile)) { + keyStore.load(is, sslKeyStorePassword.toCharArray()); + } + + final KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory + .getDefaultAlgorithm()); + kmf.init(keyStore, sslKeyStorePassword.toCharArray()); + final TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory + .getDefaultAlgorithm()); + tmf.init(keyStore); + + // Creates a socket factory for HttpsURLConnection using JKS contents. + final SSLContext sc = SSLContext.getInstance("TLSv1.2"); + sc.init(kmf.getKeyManagers(), tmf.getTrustManagers(), new java.security.SecureRandom()); + + JdkSSLOptions sslOptions = RemoteEndpointAwareJdkSSLOptions.builder() + .withSSLContext(sc) + .build(); + cluster = Cluster.builder() + .addContactPoint(cassandraHost) + .withPort(cassandraPort) + .withCredentials(cassandraUsername, cassandraPassword) + .withSSL(sslOptions) + .build(); + + return cluster.connect(); + } catch (Exception ex) { + ex.printStackTrace(); + } + return null; + } + + public Cluster getCluster() { + return cluster; + } + + /* + * Closes the cluster and Cassandra session + / + public void close() { + cluster.close(); + } + + /* + * Loads Cassandra end-point details from config.properties. + * @throws Exception + / + private void loadCassandraConnectionDetails() throws Exception { + cassandraHost = config.getProperty("cassandra_host"); + cassandraPort = Integer.parseInt(config.getProperty("cassandra_port")); + cassandraUsername = config.getProperty("cassandra_username"); + cassandraPassword = config.getProperty("cassandra_password"); + String ssl_keystore_file_path = config.getProperty("ssl_keystore_file_path"); + String ssl_keystore_password = config.getProperty("ssl_keystore_password"); + + // If ssl_keystore_file_path, build the path using JAVA_HOME directory. + if (ssl_keystore_file_path == null \|\| ssl_keystore_file_path.isEmpty()) { + String javaHomeDirectory = System.getenv("JAVA_HOME"); + if (javaHomeDirectory == null \|\| javaHomeDirectory.isEmpty()) { + throw new Exception("JAVA_HOME not set"); + } + ssl_keystore_file_path = new StringBuilder(javaHomeDirectory).append("/lib/security/cacerts").toString(); + } + + sslKeyStorePassword = (ssl_keystore_password != null && !ssl_keystore_password.isEmpty()) ? + ssl_keystore_password : sslKeyStorePassword; + + sslKeyStoreFile = new File(ssl_keystore_file_path); + + if (!sslKeyStoreFile.exists() \|\| !sslKeyStoreFile.canRead()) { + throw new Exception(String.format("Unable to access the SSL Key Store file from %s", ssl_keystore_file_path)); + } + } +} +``` + +Configurations.java* + +``` java +package com.azure.cosmosdb.cassandra.util; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.util.Properties; + +/** + * Configuration utility to read the configurations from properties file + / +public class Configurations { + private static final Logger LOGGER = LoggerFactory.getLogger(Configurations.class); + private static String PROPERTY_FILE = "config.properties"; + private static Properties prop = null; + + private void loadProperties() throws IOException { + InputStream input = getClass().getClassLoader().getResourceAsStream(PROPERTY_FILE); + if (input == null) { + LOGGER.error("Sorry, unable to find {}", PROPERTY_FILE); + return; + } + prop = new Properties(); + prop.load(input); + } + + public String getProperty(String propertyName) throws IOException { + if (prop == null) { + loadProperties(); + } + return prop.getProperty(propertyName); + + } +} +``` + +Examples folder* + +CassandraSink.java +``` java +package com.azure.cosmosdb.cassandra.examples; + +import com.datastax.driver.core.PreparedStatement; +import com.datastax.driver.core.Session; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.streaming.api.functions.sink.SinkFunction; +import com.azure.cosmosdb.cassandra.repository.UserRepository; +import com.azure.cosmosdb.cassandra.util.CassandraUtils; + +public class CassandraSink implements SinkFunction<Tuple3<Integer, String, String>> { + + @Override + public void invoke(Tuple3<Integer, String, String> value, Context context) throws Exception { + + CassandraUtils utils = new CassandraUtils(); + Session cassandraSession = utils.getSession(); + try { + UserRepository repository = new UserRepository(cassandraSession); + + //Insert rows into user table + PreparedStatement preparedStatement = repository.prepareInsertStatement(); + repository.insertUser(preparedStatement, value.f0, value.f1, value.f2); + + } finally { + if (null != utils) utils.close(); + if (null != cassandraSession) cassandraSession.close(); + } + } +} +``` + +main class: CassandraDemo.java + +> [!Note] +> * Replace Kafka Broker IPs with your cluster broker IPs +> * Prepare topic +> * user `/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic user --bootstrap-server wn0-flinkd:9092` + +``` java +package com.azure.cosmosdb.cassandra.examples; + +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.api.common.typeinfo.Types; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.connector.kafka.source.KafkaSource; +import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + +public class CassandraDemo { + public static void main(String[] args) throws Exception { + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1); + + // 1. read kafka message as stream input, update the broker IPs from your Kafka setup + String brokers = "<update-broker-ips>:9092,<update-broker-ips>:9092,<update-broker-ips>:9092"; + + KafkaSource<String> source = KafkaSource.<String>builder() + .setBootstrapServers(brokers) + .setTopics("user") + .setGroupId("my-group") + .setStartingOffsets(OffsetsInitializer.earliest()) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); + + DataStream<String> kafka = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source"); + kafka.print(); + + DataStream<Tuple3<Integer,String,String>> dataStream = kafka.map(line-> { + String[] fields = line.split(","); + int v1 = Integer.parseInt(fields[0]); + Tuple3<Integer,String,String> tuple3 = Tuple3.of(v1,fields[1],fields[2]); + return tuple3; + }).returns(Types.TUPLE(Types.INT,Types.STRING,Types.STRING)); ++ + dataStream.addSink(new CassandraSink()); + + // 4. run stream + env.execute("sink Kafka to Cosmos DB for Apache Cassandra"); + } +} +``` + +### Building the project + +Run mvn clean install from azure-cosmos-db-cassandra-java-getting-started-main folder to build the project. This command generates cosmosdb-cassandra-examples.jar under target folder + +``` +root@flinkvm:/home/flinkvm/azure-cosmos-db-cassandra-java-getting-started-main/target# ll +total 91156 +drwxr-xr-x 7 root root 4096 May 15 03:54 ./ +drwxr-xr-x 7 root root 4096 May 15 03:54 ../ +drwxr-xr-x 2 root root 4096 May 15 03:54 archive-tmp/ +drwxr-xr-x 3 root root 4096 May 15 03:54 classes/ +-rw-r--r-- 1 root root 15542 May 15 03:54 cosmosdb-cassandra-examples-1.0-SNAPSHOT.jar +-rw-r--r-- 1 root root 93290819 May 15 03:54 cosmosdb-cassandra-examples.jar +drwxr-xr-x 3 root root 4096 May 15 03:54 generated-sources/ +drwxr-xr-x 2 root root 4096 May 15 03:54 maven-archiver/ +drwxr-xr-x 3 root root 4096 May 15 03:54 maven-status/ +``` + +### Uploading the jar for Apache Flink Job submission + +Upload jar into Azure storage and wget into webssh + +``` +msdata@pod-0 [ ~ ]$ ls -l cosmosdb-cassandra-examples.jar +-rw-r-- 1 msdata msdata 93290819 May 15 04:02 cosmosdb-cassandra-examples.jar +``` + +## Preparing Cosmos DB KeyStore and Table + +Run UserProfile class in /azure-cosmos-db-cassandra-java-getting-started-main/src/main/java/com/azure/cosmosdb/cassandra/examples to create Azure Cosmos DB's keystore and table. + +``` +bin/flink run -c com.azure.cosmosdb.cassandra.examples.UserProfile -j cosmosdb-cassandra-examples.jar +``` + +## Sink Kafka Topics into Cosmos DB (Apache Cassandra) + +Run CassandraDemo class to sink Kafka topic into Cosmos DB for Apache Cassandra + +``` +bin/flink run -c com.azure.cosmosdb.cassandra.examples.CassandraDemo -j cosmosdb-cassandra-examples.jar +``` ++ +## Validate Apache Flink Job Submission + +Check job on HDInsight on AKS Flink UI ++ +## Producing Messages in Kafka + +Produce message into Kafka topic + +``` python +sshuser@hn0-flinkd:~$ cat user.py +import time +from datetime import datetime +import random + +user_set = [ + 'John', + 'Mike', + 'Lucy', + 'Tom', + 'Machael', + 'Lily', + 'Zark', + 'Tim', + 'Andrew', + 'Pick', + 'Sean', + 'Luke', + 'Chunck' +] + +city_set = [ + 'Atmore', + 'Auburn', + 'Bessemer', + 'Birmingham', + 'Chickasaw', + 'Clanton', + 'Decatur', + 'Florence', + 'Greenville', + 'Jasper', + 'Huntsville', + 'Homer', + 'Homer' +] + +def main(): + while True: + unique_id = str(int(time.time())) + if random.randrange(10) < 4: + city = random.choice(city_set[:3]) + else: + city = random.choice(city_set) + user = random.choice(user_set) + print(unique_id + "," + user + "," + city ) + time.sleep(1) + +if __name__ == "__main__": + main() +``` + +``` +sshuser@hn0-flinkd:~$ python user.py \| /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --bootstrap-server wn0-flinkd:9092 --topic user & +[2] 11516 +``` + +## Check table on Cosmos DB for Apache Cassandra on Azure portal ++ +### Preferences + +* [Azure Cosmos DB for Apache Cassandra](../../cosmos-db/cassandr). +* [Create a API for Cassandra account in Azure Cosmos DB](../../cosmos-db/cassandr) +* [Azure Samples ](https://github.com/Azure-Samples/azure-cosmos-db-cassandra-java-getting-started)
hdinsight-aks	Create Kafka Table Flink Kafka Sql Connector	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/create-kafka-table-flink-kafka-sql-connector.md	+ + Title: How to create Kafka table on Apache FlinkSQL - Azure portal +description: Learn how to create Kafka table on Apache FlinkSQL ++ Last updated : 10/06/2023++ +# Create Kafka table on Apache FlinkSQL ++ +Using this example, learn how to Create Kafka table on Apache FlinkSQL. + +## Prerequisites + +* [HDInsight Kafka](../../hdinsight/kafk) +* [HDInsight on AKS Apache Flink 1.16.0](../flink/flink-create-cluster-portal.md) + +## Kafka SQL connector on Apache Flink + +The Kafka connector allows for reading data from and writing data into Kafka topics. For more information, refer [Apache Kafka SQL Connector](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka) + +## Create a Kafka table on Apache Flink SQL + +### Prepare topic and data on HDInsight Kafka + +Prepare messages with weblog.py + +``` Python +import random +import json +import time +from datetime import datetime + +user_set = [ + 'John', + 'XiaoMing', + 'Mike', + 'Tom', + 'Machael', + 'Zheng Hu', + 'Zark', + 'Tim', + 'Andrew', + 'Pick', + 'Sean', + 'Luke', + 'Chunck' +] + +web_set = [ + 'https://google.com', + 'https://facebook.com?id=1', + 'https://tmall.com', + 'https://baidu.com', + 'https://taobao.com', + 'https://aliyun.com', + 'https://apache.com', + 'https://flink.apache.com', + 'https://hbase.apache.com', + 'https://github.com', + 'https://gmail.com', + 'https://stackoverflow.com', + 'https://python.org' +] + +def main(): + while True: + if random.randrange(10) < 4: + url = random.choice(web_set[:3]) + else: + url = random.choice(web_set) + + log_entry = { + 'userName': random.choice(user_set), + 'visitURL': url, + 'ts': datetime.now().strftime("%m/%d/%Y %H:%M:%S") + } + + print(json.dumps(log_entry)) + time.sleep(0.05) + +if __name__ == "__main__": + main() +``` + +Pipeline to Kafka topic + +``` +sshuser@hn0-contsk:~$ python weblog.py \| /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --bootstrap-server wn0-contsk:9092 --topic click_events +``` + +Other commands: + +``` +-- create topic +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic click_events --bootstrap-server wn0-contsk:9092 + +-- delete topic +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --delete --topic click_events --bootstrap-server wn0-contsk:9092 + +-- consume topic +sshuser@hn0-contsk:~$ /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server wn0-contsk:9092 --topic click_events --from-beginning +{"userName": "Luke", "visitURL": "https://flink.apache.com", "ts": "06/26/2023 14:33:43"} +{"userName": "Tom", "visitURL": "https://stackoverflow.com", "ts": "06/26/2023 14:33:43"} +{"userName": "Chunck", "visitURL": "https://google.com", "ts": "06/26/2023 14:33:44"} +{"userName": "Chunck", "visitURL": "https://facebook.com?id=1", "ts": "06/26/2023 14:33:44"} +{"userName": "John", "visitURL": "https://tmall.com", "ts": "06/26/2023 14:33:44"} +{"userName": "Andrew", "visitURL": "https://facebook.com?id=1", "ts": "06/26/2023 14:33:44"} +{"userName": "John", "visitURL": "https://tmall.com", "ts": "06/26/2023 14:33:44"} +{"userName": "Pick", "visitURL": "https://google.com", "ts": "06/26/2023 14:33:44"} +{"userName": "Mike", "visitURL": "https://tmall.com", "ts": "06/26/2023 14:33:44"} +{"userName": "Zheng Hu", "visitURL": "https://tmall.com", "ts": "06/26/2023 14:33:44"} +{"userName": "Luke", "visitURL": "https://facebook.com?id=1", "ts": "06/26/2023 14:33:44"} +{"userName": "John", "visitURL": "https://flink.apache.com", "ts": "06/26/2023 14:33:44"} + +``` + +### Apache Flink SQL client + +Detailed instructions are provided on how to use Secure Shell for [Flink SQL client](./flink-web-ssh-on-portal-to-flink-sql.md) + +### Download Kafka SQL Connector & Dependencies into SSH + +We're using the Kafka 3.2.0 dependencies in the below step, You're required to update the command based on your Kafka version on HDInsight. +``` +wget https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/3.2.0/kafka-clients-3.2.0.jar +wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka/1.16.0/flink-connector-kafka-1.16.0.jar +``` + +### Connect to Apache Flink SQL Client + +Let's now connect to the Flink SQL Client with Kafka SQL client jars +``` +msdata@pod-0 [ /opt/flink-webssh ]$ bin/sql-client.sh -j flink-connector-kafka-1.16.0.jar -j kafka-clients-3.2.0.jar +``` + +### Create Kafka table on Apache Flink SQL + +Let's create the Kafka table on Flink SQL, and select the Kafka table on Flink SQL. + +You're required to update your Kafka bootstrap server IPs in the below snippet. + +``` sql +CREATE TABLE KafkaTable ( +`userName` STRING, +`visitURL` STRING, +`ts` TIMESTAMP(3) METADATA FROM 'timestamp' +) WITH ( +'connector' = 'kafka', +'topic' = 'click_events', +'properties.bootstrap.servers' = '<update-kafka-bootstrapserver-ip>:9092,<update-kafka-bootstrapserver-ip>:9092,<update-kafka-bootstrapserver-ip>:9092', +'properties.group.id' = 'my_group', +'scan.startup.mode' = 'earliest-offset', +'format' = 'json' +); + +select * from KafkaTable; +``` ++ +### Produce Kafka messages + +Let's now produce Kafka messages to the same topic, using HDInsight Kafka +``` +python weblog.py \| /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --bootstrap-server wn0-contsk:9092 --topic click_events +``` + +### Table on Apache Flink SQL + +You can monitor the table on Flink SQL ++ +Here are the streaming jobs on Flink Web UI ++ +## Reference + +* [Apache Kafka SQL Connector](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/kafka)
hdinsight-aks	Datastream Api Mongodb	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/datastream-api-mongodb.md	+ + Title: DataStream API for MongoDB as a source and sink on Apache Flink +description: Learn how to use DataStream API for MongoDB as a source and sink on Apache Flink ++ Last updated : 08/29/2023++ +# DataStream API for MongoDB as a source and sink on Apache Flink ++ +Apache Flink provides a MongoDB connector for reading and writing data from and to MongoDB collections with at-least-once guarantees. + +This example demonstrates on how to use HDInsight on AKS Apache Flink 1.16.0 along with your existing MongoDB as Sink and Source with Flink DataStream API MongoDB connector. + +MongoDB is a non-relational document database that provides support for JSON-like storage that helps store complex structures easily. + +In this example, you learn how to use MongoDB to source and sink with DataStream API. + +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* For this demonstration, use a Window VM as maven project develop env in the same VNET as HDInsight on AKS. +* We use the [Apache Flink - MongoDB Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/mongodb/) +* For this demonstration, use an Ubuntu VM in the same VNET as HDInsight on AKS, install a MongoDB on this VM. + +## Installation of MongoDB on Ubuntu VM + +[Install MongoDB on Ubuntu](https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-ubuntu/) + +[MongoDB Shell commands](https://www.mongodb.com/docs/mongodb-shell/run-commands/) + +Prepare MongoDB environment: +``` +root@contosoubuntuvm:/var/lib/mongodb# vim /etc/mongod.conf + +# network interfaces +net: + port: 27017 + bindIp: 0.0.0.0 + +-- Start mongoDB +root@contosoubuntuvm:/var/lib/mongodb# systemctl start mongod +root@contosoubuntuvm:/var/lib/mongodb# systemctl status mongod +ΓùÅ mongod.service - MongoDB Database Server + Loaded: loaded (/lib/systemd/system/mongod.service; disabled; vendor preset: enabled) + Active: active (running) since Fri 2023-06-16 00:07:39 UTC; 5s ago + Docs: https://docs.mongodb.org/manual + Main PID: 415775 (mongod) + Memory: 165.4M + CGroup: /system.slice/mongod.service + ΓööΓöÇ415775 /usr/bin/mongod --config /etc/mongod.conf + +Jun 16 00:07:39 contosoubuntuvm systemd[1]: Started MongoDB Database Server. +Jun 16 00:07:39 contosoubuntuvm mongod[415775]: {"t":{"$date":"2023-06-16T00:07:39.091Z"},"s":"I", "c":"CONTROL", "id":7484500, "ctx":"-","msg"> + +-- check connectivity +root@contosoubuntuvm:/var/lib/mongodb# telnet 10.0.0.7 27017 +Trying 10.0.0.7... +Connected to 10.0.0.7. +Escape character is '^]'. + +-- Use mongosh to connect to mongodb +root@contosoubuntuvm:/var/lib/mongodb# mongosh "mongodb://10.0.0.7:27017/test" +Current Mongosh Log ID: 648bccc3b8a6b0885614b2dc +Connecting to: mongodb://10.0.0.7:27017/test?directConnection=true&appName=mongosh+1.10.0 +Using MongoDB: 6.0.6 +Using Mongosh: 1.10.0 + +For mongosh info see: https://docs.mongodb.com/mongodb-shell/ ++ + The server generated these startup warnings when booting + 2023-06-16T00:07:39.103+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem + 2023-06-16T00:07:40.108+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted + 2023-06-16T00:07:40.108+00:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never' + 2023-06-16T00:07:40.108+00:00: vm.max_map_count is too low ++ +- Check `click_events` collection + +test> db.click_events.count() +0 +``` + +> [!NOTE] +> To ensure the MongoDB setup can be accessed outside, change bindIp to `0.0.0.0`. + +``` +vim /etc/mongod.conf +# network interfaces +net: + port: 27017 + bindIp: 0.0.0.0 +``` + +## Get started + +### Create a maven project on IdeaJ, to prepare the pom.xml for MongoDB Collection + +``` xml +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> +<modelVersion>4.0.0</modelVersion> + +<groupId>org.example</groupId> +<artifactId>MongoDBDemo</artifactId> +<version>1.0-SNAPSHOT</version> +<properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <scala.binary.version>2.12</scala.binary.version> +</properties> +<dependencies> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-mongodb</artifactId> + <version>1.0.1-1.16</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-common --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-common</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-api-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-api-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/net.java.loci/jsr308-all --> + <dependency> + <groupId>net.java.loci</groupId> + <artifactId>jsr308-all</artifactId> + <version>1.1.2</version> + </dependency> +</dependencies> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-assembly-plugin</artifactId> + <version>3.0.0</version> + <configuration> + <appendAssemblyId>false</appendAssemblyId> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> +``` + +### Generate a stream source and sink to the MongoDB collection:click_events +MongoDBSinkDemo.java +``` java +package contoso.example; + +import com.mongodb.client.model.InsertOneModel; +import org.apache.flink.connector.base.DeliveryGuarantee; +import org.apache.flink.connector.mongodb.sink.MongoSink; +import org.apache.flink.streaming.api.datastream.DataStreamSource; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.bson.BsonDocument; + +public class MongoDBSinkDemo { + public static void main(String[] args) throws Exception { + // 1. get stream env + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + // 2. event data source, update the ip address from 10.0.0.7 to your MongoDB IP + DataStreamSource<Event> stream = env.addSource(new ClickSource()); + stream.print(); + + MongoSink<Event> sink = MongoSink.<Event>builder() + .setUri("mongodb://10.0.0.7:27017") + .setDatabase("test") + .setCollection("click_events") + .setBatchSize(1000) + .setBatchIntervalMs(1000) + .setMaxRetries(3) + .setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) + .setSerializationSchema( + (input, context) -> new InsertOneModel<>(BsonDocument.parse(String.valueOf(input)))) + .build(); + + stream.sinkTo(sink); + + env.execute("Sink click events to MongoDB"); + } +} +``` +Stream click event source: +ClickSource.java +``` java +package contoso.example; +import org.apache.flink.streaming.api.functions.source.SourceFunction; + +import java.util.Calendar; +import java.util.Random; + +public class ClickSource implements SourceFunction<Event> { + // declare a flag + private Boolean running = true; + + // declare a flag + public void run(SourceContext<Event> ctx) throws Exception{ + // generate random record + Random random = new Random(); + String[] users = {"Mary","Alice","Bob","Cary"}; + String[] urls = {"./home","./cart","./fav","./prod?id=100","./prod?id=10"}; + + // loop generate + while (running) { + String user = users[random.nextInt(users.length)]; + String url = urls[random.nextInt(urls.length)]; + Long timestamp = Calendar.getInstance().getTimeInMillis(); + String ts = timestamp.toString(); + ctx.collect(new Event(user,url,ts)); + Thread.sleep(2000); + } + } + @Override + public void cancel() + { + running = false; + } +} +``` + +Event.java +``` java +package contoso.example; +import java.sql.Timestamp; + +public class Event { + + public String user; + public String url; + public String ts; + + public Event() { + } + + public Event(String user, String url, String ts) { + this.user = user; + this.url = url; + this.ts = ts; + } + + @Override + public String toString(){ + return "{" + + "user: \"" + user + "\"" + + ",url: \"" + url + "\"" + + ",ts: " + ts + + "}"; + } +} +``` +### Use MongoDB as a source and sink to ADLS Gen2 + +Write a program for MongoDB as a source and sink to ADLS Gen2 + +MongoDBSourceDemo.java +``` java +package contoso.example; + +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.serialization.SimpleStringEncoder; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.configuration.MemorySize; +import org.apache.flink.connector.file.sink.FileSink; +import org.apache.flink.connector.mongodb.source.MongoSource; +import org.apache.flink.connector.mongodb.source.enumerator.splitter.PartitionStrategy; +import org.apache.flink.connector.mongodb.source.reader.deserializer.MongoDeserializationSchema; +import org.apache.flink.core.fs.Path; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.DefaultRollingPolicy; +import org.bson.BsonDocument; + +import java.time.Duration; + +public class MongoDBSourceDemo { + public static void main(String[] args) throws Exception { + + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + MongoSource<String> mongoSource = MongoSource.<String>builder() + .setUri("mongodb://10.0.0.7:27017") // update with the correct IP address + .setDatabase("test") + .setCollection("click_events") + .setFetchSize(2048) + .setLimit(10000) + .setNoCursorTimeout(true) + .setPartitionStrategy(PartitionStrategy.SAMPLE) + .setPartitionSize(MemorySize.ofMebiBytes(64)) + .setSamplesPerPartition(10) + .setDeserializationSchema(new MongoDeserializationSchema<String>() { + @Override + public String deserialize(BsonDocument document) { + return document.toJson(); + } + + @Override + public TypeInformation<String> getProducedType() { + return BasicTypeInfo.STRING_TYPE_INFO; + } + }) + .build(); + + DataStream stream = env.fromSource(mongoSource, WatermarkStrategy.noWatermarks(), "MongoDB-Source"); + stream.print(); + // 3. sink to gen2, update with your container name and storage path + String outputPath = "abfs://<update-container>@<storage-path>.dfs.core.windows.net/flink/mongo_click_events"; + FileSink<String> gen2 = FileSink + .forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8")) + .withRollingPolicy( + DefaultRollingPolicy.builder() + .withRolloverInterval(Duration.ofMinutes(5)) + .withInactivityInterval(Duration.ofMinutes(3)) + .withMaxPartSize(MemorySize.ofMebiBytes(5)) + .build()) + .build(); + + stream.sinkTo(gen2); + + env.execute("MongoDB as a Source Sink to Gen2"); + } +} +``` +### Package the maven jar, and submit to Apache Flink UI + +Package the maven jar, upload it to Storage and then wget it to [Flink CLI](./flink-web-ssh-on-portal-to-flink-sql.md) or directly upload to Flink UI to run. ++ +Check Flink UI ++ +### Validate results + +Sink click events to Mongo DB's admin.click_events collection +``` +test> db.click_events.count() +24 + +test> db.click_events.find() +[ + { + _id: ObjectId("648bc933a68ca7614e1f87a2"), + user: 'Alice', + url: './prod?id=10', + ts: Long("1686882611148") + }, + { + _id: ObjectId("648bc935a68ca7614e1f87a3"), + user: 'Bob', + url: './prod?id=10', + ts: Long("1686882613148") + }, +ΓÇªΓÇª. + +``` +Use Mongo DB's admin.click_events collection as a source, and sink to ADLS Gen2 +
hdinsight-aks	Fabric Lakehouse Flink Datastream Api	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/fabric-lakehouse-flink-datastream-api.md	+ + Title: Microsoft Fabric with Apache Flink in HDInsight on AKS +description: An introduction to lakehouse on Microsoft Fabric with Apache Flink over HDInsight on AKS ++ Last updated : 08/29/2023+ +# Connect to OneLake in Microsoft Fabric with HDInsight on AKS cluster for Apache Flink ++ +This example demonstrates on how to use HDInsight on AKS Apache Flink with [Microsoft Fabric](/fabric/get-started/microsoft-fabric-overview). + +[Microsoft Fabric](/fabric/get-started/microsoft-fabric-overview) is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. +With Fabric, you don't need to piece together different services from multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-use product that is designed to simplify your analytics needs. + +In this example, you learn how to connect to OneLake in Microsoft Fabric with HDInsight on AKS cluster for Apache Flink. + +## Prerequisites +* [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* Create a License Mode of at least Premium Capacity Workspace on [Power BI](https://app.powerbi.com/) +* [Create a Lake House](/fabric/data-engineering/tutorial-build-lakehouse) on this workspace + +## Connect to One Lake Storage + +### Microsoft Fabric and Lakehouse + +Lakehouse in Microsoft Fabric + +[Microsoft Fabric Lakehouse](/fabric/data-engineering/lakehouse-overview) is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location. + +> [!Note] +> [Microsoft Fabric](/fabric/get-started/microsoft-fabric-overview) is in [preview](/fabric/get-started/preview) + +#### Managed identity access to the Fabric workspace + +In this step, we provide access to the user managed identity to Fabric. You're required to type the user assigned managed identity and add to your Fabric workspace. + + :::image type="content" source="./media/fabric-lakehouse-flink-datastream-api/managed-identity-access-fabric.png" alt-text="Screenshot showing how to provide access to the user managed identity to Fabric." border="true" lightbox="./media/fabric-lakehouse-flink-datastream-api/managed-identity-access-fabric.png"::: + +#### Prepare a Delta table under LakeHouse Files folder + +In this step, you see how we prepare a Delta table on the lakehouse on Microsoft Fabric; Flink developers can build into broader Lakehouse architecture with this setup. ++ +### Apache Flink DataStream Source code + +In this step, we prepare the jar to submit to the HDInsight on AKS, Apache Flink cluster. + +This step illustrates, that we package dependencies needed for onelakeDemo + +maven pom.xml +``` xml + <properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <scala.binary.version>2.12</scala.binary.version> + </properties> + <dependencies> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-core</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-avro --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-avro</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-api-java-bridge --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-api-java-bridge</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table</artifactId> + <version>${flink.version}</version> + <type>pom</type> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-parquet</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.parquet</groupId> + <artifactId>parquet-avro</artifactId> + <version>1.12.2</version> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-core</artifactId> + <version>1.2.1</version> + </dependency> + </dependencies> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-assembly-plugin</artifactId> + <version>3.0.0</version> + <configuration> + <appendAssemblyId>false</appendAssemblyId> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> +``` + +main: onelakeDemo + +In this step, we read parquet file on Fabric lakehouse and then sink to another file in the same folder: + +``` java +package contoso.example; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.connector.file.src.FileSource; +import org.apache.flink.core.fs.Path; + +import org.apache.flink.formats.parquet.avro.AvroParquetReaders; +import org.apache.flink.formats.parquet.avro.ParquetAvroWriters; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.api.functions.sink.SinkFunction; +import org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink; + +public class onelakeDemo { + public static void main(String[] args) throws Exception { + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + Path sourcePath = new Path("abfss://contosoworkspace1@msit-onelake.dfs.fabric.microsoft.com/contosolakehouse.Lakehouse/Files/delta/tab1/"); + Path sinkPath = new Path("abfss://contosoworkspace1@msit-onelake.dfs.fabric.microsoft.com/contosolakehouse.Lakehouse/Files/delta/tab1_out/"); + + Schema avroSchema = new Schema.Parser() + .parse("{\"type\":\"record\",\"name\":\"example\",\"fields\":[{\"name\":\"Date\",\"type\":\"string\"},{\"name\":\"Time\",\"type\":\"string\"},{\"name\":\"TargetTemp\",\"type\":\"string\"},{\"name\":\"ActualTemp\",\"type\":\"string\"},{\"name\":\"System\",\"type\":\"string\"},{\"name\":\"SystemAge\",\"type\":\"string\"},{\"name\":\"BuildingID\",\"type\":\"string\"}]}"); + + FileSource<GenericRecord> source = + FileSource.forRecordStreamFormat( + AvroParquetReaders.forGenericRecord(avroSchema), sourcePath) + .build(); + + StreamingFileSink<GenericRecord> sink = + StreamingFileSink.forBulkFormat( + sinkPath, + ParquetAvroWriters.forGenericRecord(avroSchema)) + .build(); + + env.enableCheckpointing(10L); + + DataStream<GenericRecord> stream = + env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source"); + + stream.addSink((SinkFunction<GenericRecord>) sink); + env.execute(); + } +} +``` +### Package the jar and submit to Flink + +Here, we use the packaged jar and submit to Flink cluster +++ +### Results on Microsoft Fabric + +Let's check the output on Microsoft Fabric +++ +### References +* [Microsoft Fabric](/fabric/get-started/microsoft-fabric-overview) +* [Microsoft Fabric Lakehouse](/fabric/data-engineering/lakehouse-overview)
hdinsight-aks	Flink Catalog Delta Hive	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-catalog-delta-hive.md	+ + Title: Table API and SQL - Use Delta Catalog type with Hive in HDInsight on AKS - Apache Flink +description: Learn about how to create Apache Flink-Delta Catalog in HDInsight on AKS - Apache Flink ++ Last updated : 08/29/2023++ +# Create Apache Flink-Delta Catalog ++ +[Delta Lake](https://docs.delta.io/latest/delta-intro.html) is an open source project that enables building a Lakehouse architecture on top of data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes. + +In this article, we learn how Apache Flink SQL/TableAPI is used to implement a Delta catalog for Apache Flink, with Hive Catalog. Delta Catalog delegates all metastore communication to Hive Catalog. It uses the existing logic for Hive or In-Memory metastore communication that is already implemented in Flink. + +### Prerequisites +- You're required to have an operational Flink cluster with secure shell, learn how to [create a cluster](./flink-create-cluster-portal.md) +- You can refer this article on how to use CLI from [Secure Shell](./flink-web-ssh-on-portal-to-flink-sql.md) on Azure portal. + +### Add dependencies + +Once you launch the Secure Shell (SSH), let us start downloading the dependencies required to the SSH node, to illustrate the Delta table managed in Hive catalog. + + ``` + wget https://repo1.maven.org/maven2/io/delta/delta-standalone_2.12/3.0.0rc1/delta-standalone_2.12-3.0.0rc1.jar -P $FLINK_HOME/lib + wget https://repo1.maven.org/maven2/io/delta/delta-flink/3.0.0rc1/delta-flink-3.0.0rc1.jar -P $FLINK_HOME/lib + wget https://repo1.maven.org/maven2/com/chuusai/shapeless_2.12/2.3.4/shapeless_2.12-2.3.4.jar -P $FLINK_HOME/lib + wget https://repo1.maven.org/maven2/org/apache/flink/flink-parquet/1.16.0/flink-parquet-1.16.0.jar -P $FLINK_HOME/lib + wget https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop-bundle/1.12.2/parquet-hadoop-bundle-1.12.2.jar -P $FLINK_HOME/lib + ``` + +### Start the Apache Flink SQL Client +A detailed explanation is given on how to get started with Flink SQL Client using [Secure Shell](./flink-web-ssh-on-portal-to-flink-sql.md) on Azure portal. You're required to start the SQL Client as described on the article by running the following command. +``` +./bin/sql-client.sh +``` +#### Create Delta Catalog using Hive catalog + +```sql + CREATE CATALOG delta_catalog WITH ( + 'type' = 'delta-catalog', + 'catalog-type' = 'hive'); +``` +Using the delta catalog + +```sql + USE CATALOG delta_catalog; +``` + +#### Add dependencies to server classpath + +```sql + ADD JAR '/opt/flink-webssh/lib/delta-flink-3.0.0rc1.jar'; + ADD JAR '/opt/flink-webssh/lib/delta-standalone_2.12-3.0.0rc1.jar'; + ADD JAR '/opt/flink-webssh/lib/shapeless_2.12-2.3.4.jar'; + ADD JAR '/opt/flink-webssh/lib/parquet-hadoop-bundle-1.12.2.jar'; + ADD JAR '/opt/flink-webssh/lib/flink-parquet-1.16.0.jar'; +``` +#### Create Table + +We use arrival data of flights from a sample data, you can choose a table of your choice. + +```sql + CREATE TABLE flightsintervaldata1 (arrivalAirportCandidatesCount INT, estArrivalHour INT) PARTITIONED BY (estArrivalHour) WITH ('connector' = 'delta', 'table-path' = 'abfs://container@storage_account.dfs.core.windows.net'/delta-output); +``` +> [!NOTE] +> In the above step, the container and storage account need not be same as specified during the cluster creation. In case you want to specify another storage account, you can update `core-site.xml` with `fs.azure.account.key.<account_name>.dfs.core.windows.net: <azure_storage_key>` using configuration management. + +#### Insert Data into the Delta Table + +```sql + INSERT INTO flightsintervaldata1 SELECT 76, 12; +``` + +> [!IMPORTANT] +> - Delta-Flink Connector has an known [issue](https://github.com/delta-io/delta/issues/1931) with String DataType, String DataType is not being consumed properly for delta-flink while partitioning or otherwise. +> - Delta-Flink has a known [issue](https://github.com/delta-io/delta/issues/1971) on viewing the table schema in Trino for the table when registered in Hive metastore (HMS) from Flink. Read and Write operations using Trino with same Flink HMS are not operational due to this issue. + +#### Output of the Delta Table + +You can view the Delta Table output on the ABFS container +
hdinsight-aks	Flink Catalog Iceberg Hive	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-catalog-iceberg-hive.md	+ + Title: Table API and SQL - Use Iceberg Catalog type with Hive in HDInsight on AKS - Apache Flink +description: Learn how to create Apache Flink-Iceberg Catalog in HDInsight on AKS - Apache Flink ++ Last updated : 08/29/2023++ +# Create Apache Flink-Iceberg Catalog ++ +[Apache Iceberg](https://iceberg.apache.org/) is an open table format for huge analytic datasets. Iceberg adds tables to compute engines like Flink, using a high-performance table format that works just like a SQL table. Apache Iceberg [supports](https://iceberg.apache.org/multi-engine-support/#apache-flink) both Apache FlinkΓÇÖs DataStream API and Table API. + +In this article, we learn how to use Iceberg Table managed in Hive catalog, with HDInsight on AKS - Flink + +## Prerequisites +- You're required to have an operational Flink cluster with secure shell, learn how to [create a cluster](../flink/flink-create-cluster-portal.md) + - Refer this article on how to use CLI from [Secure Shell](./flink-web-ssh-on-portal-to-flink-sql.md) on Azure portal. + +### Add dependencies + +Once you launch the Secure Shell (SSH), let us start downloading the dependencies required to the SSH node, to illustrate the Iceberg table managed in Hive catalog. + + ``` + wget https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.16/1.3.0/iceberg-flink-runtime-1.16-1.3.0.jar -P $FLINK_HOME/lib + wget https://repo1.maven.org/maven2/org/apache/parquet/parquet-column/1.12.2/parquet-column-1.12.2.jar -P $FLINK_HOME/lib + ``` + +## Start the Apache Flink SQL Client +A detailed explanation is given on how to get started with Flink SQL Client using [Secure Shell](./flink-web-ssh-on-portal-to-flink-sql.md) on Azure portal. You're required to start the SQL Client as described on the article by running the following command. +``` +./bin/sql-client.sh +``` +### Create Iceberg Table managed in Hive catalog + +With the following steps, we illustrate how you can create Flink-Iceberg Catalog using Hive catalog + +```sql + CREATE CATALOG hive_catalog WITH ( + 'type'='iceberg', + 'catalog-type'='hive', + 'uri'='thrift://hive-metastore:9083', + 'clients'='5', + 'property-version'='1', + 'warehouse'='abfs://container@storage_account.dfs.core.windows.net/ieberg-output'); +``` +> [!NOTE] +> - In the above step, the container and storage account need not be same as specified during the cluster creation. +> - In case you want to specify another storage account, you can update `core-site.xml` with `fs.azure.account.key.<account_name>.dfs.core.windows.net: <azure_storage_key>` using configuration management. + +```sql + USE CATALOG hive_catalog; +``` + +#### Add dependencies to server classpath + +```sql + ADD JAR '/opt/flink-webssh/lib/iceberg-flink-runtime-1.16-1.3.0.jar'; + ADD JAR '/opt/flink-webssh/lib/parquet-column-1.12.2.jar'; +``` +#### Create Database + +```sql + CREATE DATABASE iceberg_db_2; + USE iceberg_db_2; +``` +#### Create Table + +```sql + CREATE TABLE `hive_catalog`.`iceberg_db_2`.`iceberg_sample_2` + ( + id BIGINT COMMENT 'unique id', + data STRING + ) + PARTITIONED BY (data); +``` +#### Insert Data into the Iceberg Table + +```sql + INSERT INTO `hive_catalog`.`iceberg_db_2`.`iceberg_sample_2` VALUES (1, 'a'); +``` + +#### Output of the Iceberg Table + +You can view the Iceberg Table output on the ABFS container +
hdinsight-aks	Flink Cluster Configuration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-cluster-configuration.md	+ + Title: Flink cluster configuration - HDInsight on AKS - Apache Flink +description: Learn Flink cluster configuration troubleshoot in HDInsight on AKS - Apache Flink ++ Last updated : 09/26/2023++ +# Troubleshoot Flink cluster configuration ++ +Incorrect cluster configuration may lead to deployment errors. Typically those errors occur when incorrect configuration provided in ARM template or input in Azure portal, for example, on [Configuration management](flink-configuration-management.md) page. + +Example configuration error: + + :::image type="image" source="./media/flink-cluster-configuration/error.png" alt-text="Screenshot shows error." border="true" lightbox="./media/flink-cluster-configuration/error.png"::: + +The following table provides error codes and their description to help diagnose and fix common errors. + +## Configuration error + +\| Error Code \| Description \| +\|\|\| +\| FlinkClusterValidator#IdentityValidator \| Checks if the task manager (TM) and job manager (JM) process size has suffix mb. \| +\| \|Checks if the TM and JM process size is less than the configured pod memory. \| +\|FlinkClusterValidator#IdentityValidator \| Verifies if the pod identity is configured correctly \| +\| FlinkClusterValidator#ClusterSpecValidator \| Checks if the JM, TM and history server (HS) pod CPU configured is within the configurable/allocatable SKU limits \| +\| \|Checks if the JM, TM and history server (HS) pod memory configured is within the configurable/allocatable SKU limits \| +\| FlinkClusterValidator#StorageSpecValidator \| Storage container validation for the appropriate name of the container \| +\| \| Verify with the supported storage types \| + +## System error + +Some of the errors may occur due to environment conditions and be transient. These errors have reason starting with "System" as prefix. In such cases, try the following steps: + +1. Collect the following information: + + - Azure request CorrelationId. It can be found either in Notifications area; or under Resource Group where cluster is located, on Deployments page; or in `az` command output. + + - DeploymentId. It can be found in the Cluster Overview page. + + - Detailed error message. + +1. Contact support team with this information. + +\| Error code \| Description \| +\|\|\| +\| System.DependencyFailure \| Failure in one of cluster components. \| ++++
hdinsight-aks	Flink Configuration Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-configuration-management.md	+ + Title: Apache Flink Configuration Management in HDInsight on AKS +description: Learn about Apache Flink Configuration Management in HDInsight on AKS ++ Last updated : 08/29/2023++ +# Apache Flink configuration management ++ +HDInsight on AKS provides a set of default configurations of Apache Flink for most properties and a few based on common application profiles. However, in case you're required to tweak Flink configuration properties to improve performance for certain applications with state usage, parallelism, or memory settings, you can change certain properties at cluster level using Configuration management section in HDInsight on AKS Flink. + +1. Go to Configuration Management section on your Apache Flink cluster page + + :::image type="content" source="./media/flink-configuration-management/configuration-page-revised.png" alt-text="Screenshot showing Apache Flink Configuration Management page." lightbox="./media/flink-configuration-management/configuration-page-revised.png"::: + +2. Update configurations as required at Cluster level + + :::image type="content" source="./media/flink-configuration-management/update-configuration-revised.png" alt-text="Screenshot showing Apache Flink Update configuration page." lightbox="./media/flink-configuration-management/update-configuration-revised.png"::: + +Here the checkpoint interval is changed at Cluster level. + +3. Update the changes by clicking OK and then Save. + +Once saved, the new configurations get updated in a few minutes (~5 minutes). + +Configurations, which can be updated using Configuration Management Settings + +`processMemory size:` + +The default settings for the process memory size of or job manager and task manager would be the memory configured by the user during cluster creation. + +This size can be configured by using the below configuration property. In-order to change task manager process memory, use this configuration + +`taskmanager.memory.process.size : <value>` + +Example: +`taskmanager.memory.process.size : 2000mb` + +For job manager, + +`jobmanager.memory.process.size : <value>` + +> [!NOTE] +> The maximum configurable process memory is equal to the memory configured for `jobmanager/taskmanager`. + +## Checkpoint Interval + +The checkpoint interval determines how often Flink triggers a checkpoint. it's defined in milliseconds and can be set using the following configuration property: + +`execution.checkpoint.interval: <value>` + +Default setting is 60,000 milliseconds (1 min), this value can be changed as desired. + +## State Backend + +The state backend determines how Flink manages and persists the state of your application. It impacts how checkpoints are stored. You can configure the `state backend using the following property: + +`state.backend: <value>` + +By default HDInsight on AKS Flink uses rocks db + +## Checkpoint Storage Path + +We allow persistent checkpoints by default by storing the checkpoints in `abfs` storage as configured by the user. Even if the job fails, since the checkpoints are persisted, it can be easily started with the latest checkpoint. + +`state.checkpoints.dir: <path>` +Replace `<path>` with the desired path where the checkpoints are stored. + +By default, it's stored in the storage account (ABFS), configured by the user. This value can be changed to any path desired as long as the Flink pods can access it. + +## Maximum Concurrent Checkpoints + +You can limit the maximum number of concurrent checkpoints by setting the following property: +`checkpoint.max-concurrent-checkpoints: <value>` + +Replace `<value>` with the desired maximum number of concurrent checkpoints. For example, 1 to allow only one checkpoint at a time. + +## Maximum retained checkpoints + +You can limit the maximum number of checkpoints to be retained by setting the following property: +`state.checkpoints.num-retained: <value>` +Replace `<value>` with desired maximum number. By default we retain maximum five checkpoints. + +## Savepoint Storage path + +We allow persistent savepoints by default by storing the savepoints in `abfs` storage (as configured by the user). If the user wants to stop and later start the job with a particular savepoint, they can configure this location. +state.checkpoints.dir: `<path>` +Replace` <path>` with the desired path where the savepoints are stored. +By default, it's stored in the storage account, configured by the user. (We support ABFS). This value can be changed to any path desired as long as the Flink pods can access it. + +## Job manager high availability + +In HDInsight on AKS, Flink uses Kubernetes as backend. Even if the Job Manager fails in between due to any known/unknown issue, the pod is restarted within a few seconds. Hence, even if the job restarts due to this issue, the job is recovered back from the latest checkpoint. + +### FAQ + +Why does the Job failure in between +Even if the jobs fail abruptly, if the checkpoints are happening continuously, then the job is restarted by default from the latest checkpoint. + +Change the job strategy in between? +There are use cases, where the job needs to be modified while in production due to some job level bug. During that time, the user can stop the job, which would automatically take a savepoint and save it in savepoint location. + +`bin/flink stop <JOBID>` + +Example: + +``` +root [ ~ ]# ./bin/flink stop 60bdf21d9bc3bc65d63bc3d8fc6d5c54 +Suspending job "60bdf21d9bc3bc65d63bc3d8fc6d5c54" with a CANONICAL savepoint. +Savepoint completed. Path: abfs://flink061920231244@f061920231244st.dfs.core.windows.net/8255a11812144c28b4ddf1068460c96b/savepoints/savepoint-60bdf2-7717485d15e3 +``` + +Later the user can start the job with bug fix pointing to the savepoint. + +``` +./bin/flink run <JOB_JAR> -d <SAVEPOINT_LOC> +root [ ~ ]# ./bin/flink run examples/streaming/StateMachineExample.jar -s abfs://flink061920231244@f061920231244st.dfs.core.windows.net/8255a11812144c28b4ddf1068460c96b/savepoints/savepoint-60bdf2-7717485d15e3 +``` +Usage with built-in data generator: StateMachineExample [--error-rate `<probability-of-invalid-transition>] [--sleep <sleep-per-record-in-ms>]` + +Usage with Kafka: `StateMachineExample --kafka-topic <topic> [--brokers <brokers>]` + +Since savepoint is provided in the job, the Flink knows from where to start processing the data. ++ +### Reference +[Apache Flink Configurations](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)
hdinsight-aks	Flink Create Cluster Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-create-cluster-portal.md	+ + Title: Create an Apache Flink cluster - Azure portal +description: Creating an Apache Flink cluster in HDInsight on AKS in the Azure portal. ++ Last updated : 08/29/2023++ +# Create an Apache Flink cluster in the Azure portal ++ +Complete the following steps to create an Apache Flink cluster by using the Azure portal. + +## Prerequisites + +Complete the prerequisites in the following sections: +* [Resource prerequisites](../prerequisites-resources.md) +* [Create a cluster pool](../quickstart-create-cluster.md#create-a-cluster-pool) + +> [!IMPORTANT] +> * For creating a cluster in new cluster pool, assign AKS agentpool MSI "Managed Identity Operator" role on the user-assigned managed identity created as part of resource prerequisite. In case you have required permissions, this step is automated during creation. +> * AKS agentpool managed identity gets created during cluster pool creation. You can identify the AKS agentpool managed identity by (your clusterpool name)-agentpool. Follow these steps to [assign the role](../../role-based-access-control/role-assignments-portal.md#step-2-open-the-add-role-assignment-page). + +## Create an Apache Flink cluster + +Flink clusters can be created once cluster pool deployment has been completed, let us go over the steps in case you're getting started with an existing cluster pool + +1. In the Azure portal, type HDInsight cluster pools/HDInsight/HDInsight on AKS and select Azure HDInsight on AKS cluster pools to go to the cluster pools page. On the HDInsight on AKS cluster pools page, select the cluster pool in which you want to create a new Flink cluster. + + :::image type="content" source="./media/create-flink-cluster/search-bar.png" alt-text="Diagram showing search bar in Azure portal."::: + +1. On the specific cluster pool page, click [+ New cluster](../quickstart-create-cluster.md) and provide the following information: + + \| Property\| Description\| + \|\|\| + \|Subscription \| This field is autopopulated with the Azure subscription that was registered for the Cluster Pool.\| + \|Resource Group\|This field is autopopulated and shows the resource group on the cluster pool.\| + \|Region\|This field is autopopulated and shows the region selected on the cluster pool.\| + \|Cluster Pool\|This field is autopopulated and shows the cluster pool name on which the cluster is now getting created.To create a cluster in a different pool, find that cluster pool in the portal and click + New cluster.\| + \|HDInsight on AKS Pool Version\|This field is autopopulated and shows the cluster pool version on which the cluster is now getting created.\| + \|HDInsight on AKS Version \| Select the minor or patch version of the HDInsight on AKS of the new cluster.\| + \|Cluster type \| From the drop-down list, select Flink.\| + \|Cluster name\|Enter the name of the new cluster.\| + \|User-assigned managed identity \| From the drop-down list, select the managed identity to use with the cluster. If you're the owner of the Managed Service Identity (MSI), and the MSI doesn't have Managed Identity Operator role on the cluster, click the link below the box to assign the permission needed from the AKS agent pool MSI. If the MSI already has the correct permissions, no link is shown. See the [Prerequisites](#prerequisites) for other role assignments required for the MSI.\| + \|Storage account\|From the drop-down list, select the storage account to associate with the Flink cluster and specify the container name. The managed identity is further granted access to the specified storage account, using the 'Storage Blob Data Owner' role during cluster creation.\| + \|Virtual network \| The virtual network for the cluster.\| + \|Subnet\|The virtual subnet for the cluster.\| + +1. Enabling Hive catalog for Flink SQL. + + \|Property\| Description\| + \|\|\| + \|Use Hive catalog\|Enable this option to use an external Hive metastore. \| + \|SQL Database for Hive\|From the drop-down list, select the SQL Database in which to add hive-metastore tables.\| + \|SQL admin username\|Enter the SQL server admin username. This account is used by metastore to communicate to SQL database.\| + \|Key vault\|From the drop-down list, select the Key Vault, which contains a secret with password for SQL server admin username. You are required to set up an access policy with all required permissions such as key permissions, secret permissions and certificate permissions to the MSI, which is being used for the cluster creation. The MSI needs a Key Vault Administrator role, add the required permissions using IAM.\| + \|SQL password secret name\|Enter the secret name from the Key Vault where the SQL database password is stored.\| + + :::image type="content" source="./media/create-flink-cluster/flink-basics-page.png" alt-text="Screenshot showing basic tab."::: + > [!NOTE] + > By default we use the Storage account for Hive catalog same as the storage account and container used during cluster creation. + +1. Select Next: Configuration to continue. + +1. On the Configuration page, provide the following information: + + \|Property\|Description\| + \|\|\| + \|Node size\|Select the node size to use for the Flink nodes both head and worker nodes.\| + \|Number of nodes\|Select the number of nodes for Flink cluster; by default head nodes are two. The worker nodes sizing helps determine the task manager configurations for the Flink. The job manager and history server are on head nodes.\| + +1. On the Service Configuration section, provide the following information: + + \|Property\|Description\| + \|\|\| + \|Task manager CPU\|Integer. Enter the size of the Task manager CPUs (in cores).\| + \|Task manager memory in MB\|Enter the Task manager memory size in MB. Min of 1800 MB.\| + \|Job manager CPU\|Integer. Enter the number of CPUs for the Job manager (in cores).\| + \|Job manager memory in MB \| Enter the memory size in MB. Minimum of 1800 MB.\| + \|History server CPU\|Integer. Enter the number of CPUs for the Job manager (in cores).\| + \|History server memory in MB \| Enter the memory size in MB. Minimum of 1800 MB.\| + + :::image type="content" source="./media/create-flink-cluster/flink-configuration-page.png" alt-text="screenshot showing configurations tab."::: + + > [!NOTE] + > * History server can be enabled/disabled as required. + > * Schedule based autoscale is supported in Flink. You can schedule number of worker nodes as required. For example, it is enabled a schedule based autoscale with default worker node count as 3. And during weekdays from 9:00 UTC to 20:00 UTC, the worker nodes are scheduled to be 10. Later in the day, it needs to be defaulted to 3 nodes ( between 20:00 UTC to next day 09:00 UTC ). During weekends from 9:00 UTC to 20:00 UTC, worker nodes are 4. + +1. On the Auto Scale & SSH section, update the following: + + \|Property\|Description\| + \|\|\| + \|Auto Scale\|Upon selection, you would be able to choose the schedule based autoscale to configure the schedule for scaling operations.\| + \|Enable SSH\|Upon selection, you can opt for total number of SSH nodes required, which are the access points for the Flink CLI using Secure Shell. The maximum SSH nodes allowed is 5.\| + + :::image type="content" source="./media/create-flink-cluster/service-configuration.png" alt-text="Screenshot showing autoscale service configuration."::: + + :::image type="content" source="./media/create-flink-cluster/autoscale-rules.png" alt-text="Screenshot showing auto scale rules."::: +1. Click the Next: Integration button to continue to the next page. + +1. On the Integration page, provide the following information: + + \|Property\|Description\| + \|\|\| + \|Log analytics\| This feature is available only if the cluster pool has associated log analytics workspace, once enabled the logs to collect can be selected.\| + \|Azure Prometheus \| This feature is to view Insights and Logs directly in your cluster by sending metrics and logs to Azure Monitor workspace.\| + + :::image type="content" source="./media/create-flink-cluster/flink-integrations-page.png" alt-text="screenshot showing integrations tab."::: + +1. Click the Next: Tags button to continue to the next page. + +1. On the Tags page, provide the following information: + + \| Property \| Description\| + \|\|\| + \|Name \| Optional. Enter a name such as HDInsight on AKS to easily identify all resources associated with your cluster resources.\| + \| Value \| You can leave this blank.\| + \| Resource \| Select All resources selected.\| + +1. Select Next: Review + create to continue. + +1. On the Review + create page, look for the Validation succeeded message at the top of the page and then click Create. + +The Deployment is in process page is displayed which the cluster is created. It takes 5-10 minutes to create the cluster. Once the cluster is created, the "Your deployment is complete" message is displayed. If you navigate away from the page, you can check your Notifications for the current status.
hdinsight-aks	Flink How To Setup Event Hub	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-how-to-setup-event-hub.md	+ + Title: How to connect HDInsight on AKS Flink with Azure Event Hubs for Apache Kafka┬« +description: Learn how to connect HDInsight on AKS Flink with Azure Event Hubs for Apache Kafka┬« ++ Last updated : 08/29/2023++ +# Connect HDInsight on AKS Flink with Azure Event Hubs for Apache Kafka┬« ++ +A well known use case for Apache Flink is stream analytics. The popular choice by many users to use the data streams, which are ingested using Apache Kafka. Typical installations of Flink and Kafka start with event streams being pushed to Kafka, which can be consumed by Flink jobs. Azure Event Hubs provides an Apache Kafka endpoint on an event hub, which enables users to connect to the event hub using the Kafka protocol. + +In this article, we explore how to connect [Azure Event Hubs](/azure/event-hubs/event-hubs-about) with [HDInsight on AKS Flink](./flink-overview.md) and cover the following + +> [!div class="checklist"] +> * Create an Event Hubs namespace +> * Create a HDInsight on AKS Cluster with Apache Flink +> * Run Flink producer +> * Package Jar for Apache Flink +> * Job Submission & Validation + +## Create Event Hubs namespace and Event Hubs + +1. To create Event Hubs namespace and Event Hubs, see [here](/azure/event-hubs/event-hubs-quickstart-kafka-enabled-event-hubs?tabs=connection-string) + + :::image type="content" source="./media/flink-eventhub/flink-setup-event-hub.png" alt-text="Screenshot showing Event Hubs setup." border="true" lightbox="./media/flink-eventhub/flink-setup-event-hub.png"::: + +## Set up Flink Cluster on HDInsight on AKS + +1. Using existing HDInsight on AKS Cluster pool you can create a [Flink cluster](./flink-create-cluster-portal.md) + +1. Run the Flink producer adding the bootstrap.servers and the `producer.config` info + + ``` + bootstrap.servers={YOUR.EVENTHUBS.FQDN}:9093 + client.id=FlinkExampleProducer + sasl.mechanism=PLAIN + security.protocol=SASL_SSL + sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ + username="$ConnectionString" \ + password="{YOUR.EVENTHUBS.CONNECTION.STRING}"; + ``` + +1. Replace `{YOUR.EVENTHUBS.CONNECTION.STRING}` with the connection string for your Event Hubs namespace. For instructions on getting the connection string, see details on how to [get an Event Hubs connection string](/azure/event-hubs/event-hubs-get-connection-string). + + For example, + ``` + sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" + password="Endpoint=sb://mynamespace.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXXXXXXXXXXX"; + ``` +## Packaging the JAR for Flink +1. Package com.example.app; + + ``` + import org.apache.flink.api.common.functions.MapFunction; + import org.apache.flink.api.common.serialization.SimpleStringSchema; + import org.apache.flink.streaming.api.datastream.DataStream; + import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer; //v0.11.0.0 + import java.io.FileNotFoundException; + import java.io.FileReader; + import java.util.Properties; + + public class FlinkTestProducer { + + private static final String TOPIC = "test"; + private static final String FILE_PATH = "src/main/resources/producer.config"; + + public static void main(String... args) { + try { + Properties properties = new Properties(); + properties.load(new FileReader(FILE_PATH)); + + final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + DataStream stream = createStream(env); + FlinkKafkaProducer<String> myProducer = new FlinkKafkaProducer<>( + TOPIC, + new SimpleStringSchema(), // serialization schema + properties); + + stream.addSink(myProducer); + env.execute("Testing flink print"); + + } catch(FileNotFoundException e){ + System.out.println("FileNotFoundException: " + e); + } catch (Exception e) { + System.out.println("Failed with exception:: " + e); + } + } + + public static DataStream createStream(StreamExecutionEnvironment env){ + return env.generateSequence(0, 200) + .map(new MapFunction<Long, String>() { + @Override + public String map(Long in) { + return "FLINK PRODUCE " + in; + } + }); + } + } + ``` + +1. Add the snippet to run the Flink Producer. + + :::image type="content" source="./media/flink-eventhub/testing-flink.png" alt-text="Screenshot showing how to test Flink in Event Hubs." border="true" lightbox="./media/flink-eventhub/testing-flink.png"::: + +1. Once the code is executed, the events are stored in the topic ΓÇ£TESTΓÇ¥ + + :::image type="content" source="./media/flink-eventhub/events-stored-in-topic.png" alt-text="Screenshot showing Event Hubs stored in topic." border="true" lightbox="./media/flink-eventhub/events-stored-in-topic.png":::
hdinsight-aks	Flink Job Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-job-management.md	+ + Title: Flink job management in HDInsight on AKS +description: HDInsight on AKS provides a feature to manage and submit Apache Flink jobs directly through the Azure portal ++ Last updated : 09/07/2023++ +# Flink job management ++ +HDInsight on AKS provides a feature to manage and submit Apache Flink jobs directly through the Azure portal (user-friendly interface) and ARM Rest APIs. + +This feature empowers users to efficiently control and monitor their Flink jobs without requiring deep cluster-level knowledge. + +## Benefits + +- Simplified job management: With the native integration of Apache Flink in the Azure portal, users no longer require extensive knowledge of Flink clusters to submit, manage, and monitor jobs. + +- User-Friendly REST API: HDInsight on AKS provides user friendly ARM Rest APIs to submit and manage Flink jobs. Users can submit Flink jobs from any Azure service using these Rest APIs. + +- Effortless job updates and state management: The native Azure portal integration provides a hassle-free experience for updating jobs and restoring them to their last saved state (savepoint). This functionality ensures continuity and data integrity throughout the job lifecycle. + +- Automating Flink job using Azure pipeline: Using HDInsight on AKS, Flink users have access to user-friendly ARM Rest API, you can seamlessly integrate Flink job operations into your Azure Pipeline. Whether you're launching new jobs, updating running jobs, or performing various job operations, this streamlined approach eliminates manual steps. It empowers you to manage your Flink cluster efficiently. + +## Prerequisites + +There are some prerequisites before submitting and managing jobs from portal or Rest APIs. + +- Create a directory in the primary storage account of the cluster to upload the job jar. + +- If the user wants to take savepoints, then create a directory in the storage account for job savepoints. + + :::image type="image" source="./media/flink-job-management/create-directory.png" alt-text="Screenshot shows directory structure." border="true" lightbox="./media/flink-job-management/create-directory.png"::: ++ +## Key features and operations + +- New job submission: Users can effortlessly submit a new Flink, eliminating the need for complex configurations or external tools. + +- Stop and start jobs with savepoints: Users can gracefully stop and start their Flink jobs from their previous state (Savepoint). Savepoints ensure that job progress is preserved, enabling seamless resumptions. + +- Job updates: User can update the running job after updating the jar on storage account. This update automatically take the savepoint and start the job with a new jar. + +- Stateless updates: Performing a fresh restart for a job is simplified through stateless updates. This feature allows users to initiate a clean restart using updated job jar. + +- Savepoint management: At any given moment, users can create savepoints for their running jobs. These savepoints can be listed and used to restart the job from a specific checkpoint as needed. + +- Cancel: This cancels the job permanently. + +- Delete: Delete job history record. + +## Options to manage jobs in HDInsight on AKS + +HDInsight on AKS provides ways to manage Flink jobs. + +- [Azure portal](#azure-portal) +- [ARM Rest API](#arm-rest-api) + +### <a id="azure-portal">Job Management from Azure portal</a> + +To run the Flink job from portal go to: + +Portal --> HDInsight on AKS Cluster Pool --> Flink Cluster --> Settings --> Flink Jobs ++ +- New job: To submit a new job, upload the job jars to the storage account and create a savepoint directory. Complete the template with the necessary configurations and then submit the job. + + :::image type="image" source="./media/flink-job-management/create-new-job.png" alt-text="Screenshot shows how to create new job." border="true" lightbox="./media/flink-job-management/create-new-job.png"::: + + Property details: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| - \| -- \| - \| + \| Job name \| Unique name for job. This is displayed on portal. Job name should be in small latter. \| \| Yes \| + \| Jar path \| Storage path for job jar. Users should create directory in cluster storage and upload job jar.\| Yes + \| Entry class \| Entry class for job from which job execution starts. \| \| Yes \| + \| Args \| Argument for main program of job. Separate all arguments with spaces. \| \| No \| + \| parallelism \| Job Flink Parallelism. \| 2 \| Yes \| + \| savepoint.directory \| Savepoint directory for job. It is recommended that users should create a new directory for job savepoint in storage account. \| `abfs://<container>@<account>/<deployment-ID>/savepoints` \| No \| + + Once the job is launched, the job status on the portal is RUNNING. + +- Stop: Stop job did not require any parameter, user can stop the job by selecting the action. + + :::image type="image" source="./media/flink-job-management/stop-job.png" alt-text="Screenshot shows how user can stop job." border="true" lightbox="./media/flink-job-management/stop-job.png"::: + + Once the job is stopped, the job status on the portal is STOPPED. + +- Start: This action starts the job from savepoint. To start the job, select the stopped job and start it. + + :::image type="image" source="./media/flink-job-management/start-job-savepoint.png" alt-text="Screenshot shows how user start job." border="true" lightbox="./media/flink-job-management/start-job-savepoint.png"::: + + Fill the flow template with the required options and start it. Users need to select the savepoint from which user wants to start the job. By default, it takes the last successful savepoint. + + :::image type="image" source="./media/flink-job-management/fill-flow-template.png" alt-text="Screenshot shows how fill flow template." border="true" lightbox="./media/flink-job-management/fill-flow-template.png"::: + + Property details: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| - \| -- \| - \| + \| Args \| Argument for main program of job. All arguments should be separated by space. \| \| No \| + \| Last savepoint \| Last successful savepoint take before stopping job. This will used by default if not savepoint is selected. \| \| Not Editable \| + \| Save point name \| Users can list the available savepoint for job and select one to start the job. \| \| No \| + + Once the job is started, the job status on the portal will be RUNNING. + +- Update: Update helps to restart jobs with updated job code. Users need to update the latest job jar in storage location and update the job from portal. This update stops the job with savepoint and starts again with latest jar. + + :::image type="image" source="./media/flink-job-management/restart-job-with-updated-code.png" alt-text="Screenshot shows how restart jobs with updated job code." border="true" lightbox="./media/flink-job-management/restart-job-with-updated-code.png"::: + + Template for updating job. + + :::image type="image" source="./media/flink-job-management/template-for-updating-job.png" alt-text="Screenshot shows template for updating job." border="true" lightbox="./media/flink-job-management/template-for-updating-job.png"::: + + Once the job is updated, the job status on the portal is "RUNNING." + +- Stateless update: This job is like an update, but it involves a fresh restart of the job with the latest code. + + :::image type="image" source="./media/flink-job-management/stateless-update.png" alt-text="Screenshot shows fresh restart of the job with the latest code." border="true" lightbox="./media/flink-job-management/stateless-update.png"::: + + Template for updating job. + + :::image type="image" source="./media/flink-job-management/template-for-updating-stateless-job.png" alt-text="Screenshot shows template for updating stateless job." border="true" lightbox="./media/flink-job-management/template-for-updating-stateless-job.png"::: + + Property details: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| - \| -- \| - \| + \| Args \| Argument for main program of job. Separate all arguments with space. \| \| No \| + + Once the job is updated, the job status on the portal is RUNNING. + +- Savepoint: Take the savepoint for the Flink Job. + + :::image type="image" source="./media/flink-job-management/savepoint-flink-job.png" alt-text="Screenshot shows savepoint for the Flink Job." border="true" lightbox="./media/flink-job-management/savepoint-flink-job.png"::: + + Savepoint is time consuming process, and it takes some time. You can see job action status as in-progress. + + :::image type="image" source="./media/flink-job-management/job-action-status.png" alt-text="Screenshot shows job action status." border="true" lightbox="./media/flink-job-management/job-action-status.png"::: + +- Cancel: This job helps user to terminate the job. + + :::image type="image" source="./media/flink-job-management/terminate-job.png" alt-text="Screenshot shows how user can terminate the job." border="true" lightbox="./media/flink-job-management/terminate-job.png"::: + +- Delete: Delete job data from portal. + + :::image type="image" source="./media/flink-job-management/delete-job-data.png" alt-text="Screenshot shows how user can delete job data from portal." border="true" lightbox="./media/flink-job-management/delete-job-data.png"::: + +- View Job details: To view the job detail user can click on job name, it gives the details about the job and last action result. + + :::image type="image" source="./media/flink-job-management/view-job-details.png" alt-text="Screenshot shows how to view job details." border="true" lightbox="./media/flink-job-management/view-job-details.png"::: + + For any failed action, this job json give detailed exceptions and reasons for failure. + +### <a id="arm-rest-api">Job Management Using Rest API</a> + +HDInsight on AKS - Flink supports user friendly ARM Rest APIs to submit job and manage job. Using this Flink REST API, you can seamlessly integrate Flink job operations into your Azure Pipeline. Whether you're launching new jobs, updating running jobs, or performing various job operations, this streamlined approach eliminates manual steps and empowers you to manage your Flink cluster efficiently. + +#### Base URL format for Rest API + +See following URL for rest API, users need to replace subscription, resource group, cluster pool, cluster name and HDInsight on AKS API version in this before using it. + `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runjob?api-version={{API_VERSION}}` + +Using this REST API, users can initiate new jobs, stop jobs, start jobs, create savepoints, cancel jobs, and delete jobs. The current API_VERSION is 2023-06-01-preview. + +#### Rest API Authentication + +To authenticate Flink ARM Rest API users, need to get the bearer token or access token for ARM resource. To authenticate Azure ARM (Azure Resource Manager) REST API using a service principal, you can follow these general steps: + +- Create a Service Principal. + + `az ad sp create-for-rbac --name <your-SP-name>` + +- Give owner permission to SP for `flink` cluster. + +- Login with service principal. + + `az login --service-principal -u <client_id> -p <client_secret> --tenant <tenant_id>` + +- Get access token. + + `$token = az account get-access-token --resource=https://management.azure.com/ \| ConvertFrom-Json` + + `$tok = $token.accesstoken` + + Users can use token in URL shown. + + `$data = Invoke-RestMethod -Uri $restUri -Method GET -Headers @{ Authorization = "Bearer $tok" }` + +Authentication using Managed Identity: Users can utilize resources that support Managed Identity to make calls to the Job REST API. For more details, please refer to the [Managed Identity](../../active-directory/managed-identities-azure-resources/tutorial-linux-vm-access-arm.md) documentation. + +#### LIST of APIs and Parameters + +- New Job: Rest API to submit new job to Flink. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body: + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "NEW", + "jobJarDirectory": "<JOB_JAR_STORAGE_PATH>", + "jarName": "<JOB_JAR_NAME>", + "entryClass": "<JOB_ENTRY_CLASS>", + ΓÇ£argsΓÇ¥: ΓÇ¥<JOB_JVM_ARGUMENT>ΓÇ¥ + "flinkConfiguration": { + "parallelism": "<JOB_PARALLELISM>", + "savepoint.directory": "<JOB_SAVEPOINT_DIRECTORY_STORAGE_PATH>" + } + } + } + ``` + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job.It should be ΓÇ£FlinkJobΓÇ¥ \| \| Yes\| + \| jobName \| Unique name for job. This is displayed on portal. Job name should be in small latter.\| \| Yes \| + \| action \| It indicates operation type on job. It should be ΓÇ£NEWΓÇ¥ always for new job launch. \| \| Yes \| + \| jobJarDirectory \| Storage path for job jar directory. Users should create directory in cluster storage and upload job jar.\| Yes \| + \| jarName \| Name of job jar. \| \| Yes \| + \|entryClass \| Entry class for job from which job execution starts. \| \| Yes \| + \| args \| Argument for main program of job. Separate arguments with space. \| \| No \| + \| parallelism \| Job Flink Parallelism. \| 2 \| Yes \| + \| savepoint.directory \| Savepoint directory for job. It is recommended that users should create a new directory for job savepoint in storage account. \| `abfs://<container>@<account>/<deployment-ID>/savepoints`\| No \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- Stop job: Rest API for stopping current running job. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "STOP" + } + } + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be ΓÇ£FlinkJobΓÇ¥ \| Yes \| + \| jobName \| Job Name, which is used for launching the job \| Yes \| + \| action \| It should be ΓÇ£STOPΓÇ¥ \| Yes \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- Start job: Rest API to start STOPPED job. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| ++ + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "START", + "savePointName": "<SAVEPOINT_NAME>" + } + } + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be ΓÇ£FlinkJobΓÇ¥ \| Yes \| + \| jobName \| Job Name that is used for launching the job. \| Yes \| + \| action \| It should be ΓÇ£STARTΓÇ¥ \| Yes \| + \| savePointName \| Save point name to start the job. It is optional property, by default start operation take last successful savepoint. \| No \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- Update job: Rest API for updating current running job. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "UPDATE", + ΓÇ£argsΓÇ¥ : ΓÇ£<JOB_JVM_ARGUMENT>ΓÇ¥, + "savePointName": "<SAVEPOINT_NAME>" + } + } + + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be ΓÇ£FlinkJobΓÇ¥ \| \| Yes \| + \| jobName \| Job Name that is used for launching the job. \| \| Yes \| + \| action \| It should be ΓÇ£UPDATEΓÇ¥ always for new job launch. \| \| Yes \| + \| args \| Job JVM arguments \| \| No \| + \| savePointName \| Save point name to start the job. It is optional property, by default start operation will take last successful savepoint.\| \| No \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- Stateless update job: Rest API for stateless update. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "STATELESS_UPDATE", + ΓÇ£argsΓÇ¥ : ΓÇ£<JOB_JVM_ARGUMENT>ΓÇ¥ + } + } + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be ΓÇ£FlinkJobΓÇ¥ \| \| Yes \| + \| jobName \| Job Name that is used for launching the job. \| \| Yes \| + \| action \| It should be ΓÇ£STATELESS_UPDATEΓÇ¥ always for new job launch. \| \| Yes \| + \| args \| Job JVM arguments \| \| No \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- Savepoint: Rest API to trigger savepoint for job. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "SAVEPOINT" + } + } + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be ΓÇ£FlinkJobΓÇ¥ \| \| Yes \| + \| jobName \| Job Name that is used for launching the job. \| \| Yes \| + \| action \| It should be ΓÇ£SAVEPOINTΓÇ¥ always for new job launch. \| \| Yes \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- List savepoint: Rest API to list all the savepoint from savepoint directory. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "LIST_SAVEPOINT" + } + } + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be ΓÇ£FlinkJobΓÇ¥ \| \| Yes \| + \| jobName \| Job Name which is used for launching the job \| \| Yes \| + \| action \| It should be ΓÇ£LIST_SAVEPOINTΓÇ¥ \| \| Yes \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- Cancel: Rest API to cancel the job. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "CANCEL" + } + } + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be `FlinkJob` \| \| Yes \| + \| jobName \| Job Name that is used for launching the job. \| \| Yes \| + \| action \| It should be CANCEL. \| \| Yes \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- Delete: Rest API to delete job. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| POST \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/runJob?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Request Body + + ``` + { + "properties": { + "jobType": "FlinkJob", + "jobName": "<JOB_NAME>", + "action": "DELETE" + } + } + ``` + + Property details for JSON body: + + \| Property \| Description \| Default Value \| Mandatory \| + \| -- \| -- \| - \| \| + \| jobType \| Type of Job. It should be ΓÇ£FlinkJobΓÇ¥ \| \| Yes \| + \| jobName \| Job Name that is used for launching the job. \| \| Yes \| + \| action \| It should be DELETE. \| \| Yes \| + + Example: + + `Invoke-RestMethod -Uri $restUri -Method POST -Headers @{ Authorization = "Bearer $tok" } -Body $jsonString -ContentType "application/json"` + +- List Jobs: Rest API to list all the jobs and status of current action. + + \| Option \| Value \| + \| -- \| - \| + \| Method \| GET \| + \| URL \| `https://management.azure.com/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/jobs?api-version={{API_VERSION}}` \| + \| Header \| Authorization = "Bearer $token" \| + + Output: + + ``` + { + "value": [ + { + "id": "/subscriptions/{{USER_SUBSCRIPTION}}/resourceGroups/{{USER_RESOURCE_GROUP}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSER_POOL}}/clusters/{{FLINK_CLUSTER}}/jobs/job1", + "properties": { + "jobType": "FlinkJob", + "jobName": "job1", + "jobJarDirectory": "<JOB_JAR_STORAGE_PATH>", + "jarName": "<JOB_JAR_NAME>", + "action": "STOP", + "entryClass": "<JOB_ENTRY_CLASS>", + "flinkConfiguration": { + "parallelism": "2", + "savepoint.directory": "<JOB_SAVEPOINT_DIRECTORY_STORAGE_PATH>s" + }, + "jobId": "20e9e907eb360b1c69510507f88cdb7b", + "status": "STOPPED", + "jobOutput": "Savepoint completed. Path: <JOB_SAVEPOINT_DIRECTORY_STORAGE_PATH>s/savepoint-20e9e9-8a48c6b905e5", + "actionResult": "SUCCESS", + "lastSavePoint": "<JOB_SAVEPOINT_DIRECTORY_STORAGE_PATH>s/savepoint-20e9e9-8a48c6b905e5" + } + } + ] + } + ``` + +> [!NOTE] +> When any action is in progress, actionResult will indicate it with the value 'IN_PROGRESS' On successful completion, it will show 'SUCCESS', and in case of failure, it will be 'FAILED'.
hdinsight-aks	Flink Job Orchestration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-job-orchestration.md	+ + Title: Azure data factory managed airflow - HDInsight on AKS +description: Learn Flink job orchestration using Azure Data Factory managed airflow ++ Last updated : 09/23/2023++ +# Azure data factory managed airflow ++ +This article covers managing HDInsight Flink job using Azure REST API ([refer Job Management REST API section in this tutorial](flink-job-management.md)) andΓÇ»orchestration data pipelineΓÇ»withΓÇ»Azure Data Factory Managed Airflow.ΓÇ»[Azure Data Factory Managed Airflow](/azure/data-factory/concept-managed-airflow)ΓÇ»service is a simple and efficient way to create and manageΓÇ»[Apache Airflow](https://airflow.apache.org/)ΓÇ»environments, enabling you to run data pipelines at scale easily. + +Apache Airflow is an open-source platform that programmatically creates, schedules, and monitors complex data workflows. It allows you to define a set of tasks, called operators, that can be combined into directed acyclic graphs (DAGs) to represent data pipelines. + +The following diagram shows the placement of Airflow, Key Vault, and HDInsight on AKS in Azure. ++ +Multiple Azure Service Principals are created based on the scope to limit the access it needs and to manage the client credential life cycle independently. + +It is recommended to rotate access keys or secrets periodically. + +## Setup steps + +1. [Setup Flink Cluster](flink-create-cluster-portal.md) + +1. Upload your Flink Job jar to the storage account -ΓÇ» It can be the primary storage account associated with the Flink cluster or any other storage account, whereΓÇ»Assign the ΓÇ£Storage Blob Data OwnerΓÇ¥ role to the user-assigned MSI used for the cluster to this storage account. + +1. Azure Key Vault - You can follow [this tutorial to create a new Azure Key Vault](/azure/key-vault/general/quick-create-portal/) in case, if you don't have one. + +1. CreateΓÇ»[Azure AD Service Principal](/cli/azure/ad/sp/)ΓÇ»to access Key Vault ΓÇô Grant permission to access Azure Key Vault with theΓÇ»ΓÇ£Key Vault Secrets OfficerΓÇ¥ΓÇ»role, and make a note of ΓÇÿappIdΓÇÖ, ΓÇÿpasswordΓÇÖ, and ΓÇÿtenantΓÇÖ from the response. We need to use the same for Airflow to use Key Vault storage asΓÇ»backends for storing sensitive information. + + ``` + az ad sp create-for-rbac -n <sp name> --role ΓÇ£Key Vault Secrets OfficerΓÇ¥ --scopes <key vault Resource ID> + ``` ++ +1. Create Managed Airflow [enable with Azure Key Vault to store and manage your sensitive information in a secure and centralized manner](/azure/data-factory/enable-azure-key-vault-for-managed-airflow). By doing this,ΓÇ»you can use variables and connections, and they automatically be stored in Azure Key Vault. The name of connections and variables need to be prefixed by variables_prefix ΓÇ»defined in AIRFLOW__SECRETS__BACKEND_KWARGS. For example, If variables_prefix has a value as ΓÇ»hdinsight-aks-variables then for a variable key of hello, you would want to store your Variable at hdinsight-aks-variable -hello. + + - Add the following settings for theΓÇ»Airflow configuration overridesΓÇ»in integrated runtime properties: + + - AIRFLOW__SECRETS__BACKEND: + `"airflow.providers.microsoft.azure.secrets.key_vault.AzureKeyVaultBackend"` + + - AIRFLOW__SECRETS__BACKEND_KWARGS: + `"{"connections_prefix": "airflow-connections", "variables_prefix": "hdinsight-aks-variables", "vault_url":ΓÇ»<your keyvault uri>}ΓÇ¥` + + - Add the following setting for theΓÇ»Environment variablesΓÇ»configuration in the Airflow integrated runtime properties: + + - AZURE_CLIENT_IDΓÇ»= `<App Id from Create Azure AD Service Principal>` + + - AZURE_TENANT_IDΓÇ»= `<Tenant from Create Azure AD Service Principal> ` + + - AZURE_CLIENT_SECRETΓÇ»= `<Password from Create Azure AD Service Principal> ` + + Add Airflow requirements - [apache-airflow-providers-microsoft-azure](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/https://docsupdatetracker.net/index.html) + + :::image type="content" source="./media/flink-job-orchestration/airflow-configuration-environment-variable.png" alt-text="Screenshot shows airflow configuration and environment variables." lightbox="./media/flink-job-orchestration/airflow-configuration-environment-variable.png"::: + + +1. CreateΓÇ»[Azure AD Service Principal](/cli/azure/ad/sp/)ΓÇ»to access Azure ΓÇô Grant permission to access HDInsight AKS Cluster withΓÇ»ContributorΓÇ»role, make a note of appId, password, and tenant from the response. + + `az ad sp create-for-rbac -n <sp name> --role Contributor --scopes <Flink Cluster Resource ID>` + +1. Create the following secrets in your key vault with the value from the previous AD Service principal appId, password, and tenant, prefixed by propertyΓÇ»variables_prefixΓÇ»defined inΓÇ»AIRFLOW__SECRETS__BACKEND_KWARGS.ΓÇ»The DAGΓÇ»code can access any of these variables without variables_prefix. + + - hdinsight-aks-variables-api-client-id=`<App ID from previous step> ` + + - hdinsight-aks-variables-api-secret=`<Password from previous step> ` + + - hdinsight-aks-variables-tenant-id=`<Tenant from previous step> ` + + ```python + from airflow.models import Variable + + def retrieve_variable_from_akv(): + + variable_value = Variable.get("client-id") + + print(variable_value) + ``` + + +## DAG definition + +A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together, organized with dependencies and relationships to say how they should run. + +There are three ways to declare a DAG: + + 1. You can use a context manager, which adds the DAG to anything inside it implicitly + + 1. You can use a standard constructor, passing the DAG into any operators you use + + 1. You can use the @dag decorator to turn a function into a DAG generator (from airflow.decorators import dag) + +DAGs are nothing without Tasks to run, and those are come in the form of either Operators, Sensors or TaskFlow. + +You can read more details about DAGs, Control Flow, SubDAGs, TaskGroups, etc. directly fromΓÇ»[Apache Airflow](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html).ΓÇ» + +## DAG execution + +Example code is available on the [git](https://github.com/Azure-Samples/hdinsight-aks/blob/main/flink/airflow-python-sample-code); download the code locally on your computer and upload theΓÇ»wordcount.pyΓÇ»to a blob storage. Follow theΓÇ»[steps](/azure/data-factory/how-does-managed-airflow-work#steps-to-import)ΓÇ»to import DAG into your Managed Airflow created during setup. + +The wordcount.py is an example of orchestrating a Flink job submission using Apache Airflow with HDInsight on AKS. The example is based on the wordcount example provided on [Apache Flink](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/examples/). + +The DAG has two tasks: + +- get OAuth Token + +- Invoke HDInsight Flink Job Submission Azure REST API to submit a new job + +The DAG expects to have setup for the Service Principal, as described during the setup process for the OAuth Client credential and pass the following input configuration for the execution. + +### Execution steps + +1. Execute the DAG from the [Airflow UI](https://airflow.apache.org/docs/apache-airflow/stable/ui.html), you can open the Azure Data Factory Managed Airflow UI by clicking on Monitor icon. + + :::image type="content" source="./media/flink-job-orchestration/airflow-user-interface-step-1.png" alt-text="Screenshot shows open the Azure data factory managed airflow UI by clicking on monitor icon." lightbox="./media/flink-job-orchestration/airflow-user-interface-step-1.png"::: + +1. Select the ΓÇ£FlinkWordCountExampleΓÇ¥ DAG from the ΓÇ£DAGsΓÇ¥ page. + + :::image type="content" source="./media/flink-job-orchestration/airflow-user-interface-step-2.png" alt-text="Screenshot shows select the Flink word count example." lightbox="./media/flink-job-orchestration/airflow-user-interface-step-2.png"::: + +1. Click onΓÇ»the ΓÇ£executeΓÇ¥ icon from the top right corner and select ΓÇ£Trigger DAG w/ configΓÇ¥. + + :::image type="content" source="./media/flink-job-orchestration/airflow-user-interface-step-3.png" alt-text="Screenshot shows select execute icon." lightbox="./media/flink-job-orchestration/airflow-user-interface-step-3.png"::: + + +1. Pass required configuration JSON + + ```JSON + { + + "jarName":"WordCount.jar", + + "jarDirectory":"abfs://filesystem@<storageaccount>.dfs.core.windows.net", + + "subscritpion":"<cluster subscription id>", + + "rg":"<cluster resource group>", + + "poolNm":"<cluster pool name>", + + "clusterNm":"<cluster name>" + + } + ``` + +1. Click on ΓÇ£TriggerΓÇ¥ button, it starts the execution of the DAG. + +1. You can visualize the status of DAG tasks from the DAG run + + :::image type="content" source="./media/flink-job-orchestration/dag-task-status.png" alt-text="Screenshot shows dag task status." lightbox="./media/flink-job-orchestration/dag-task-status.png"::: + +1. Validate the job execution from portal + + :::image type="content" source="./media/flink-job-orchestration/validate-job-execution.png" alt-text="Screenshot shows validate job execution." lightbox="./media/flink-job-orchestration/validate-job-execution.png"::: + +1. Validate the job from ΓÇ£Apache Flink DashboardΓÇ¥ + + :::image type="content" source="./media/flink-job-orchestration/apache-flink-dashboard.png" alt-text="Screenshot shows apache Flink dashboard." lightbox="./media/flink-job-orchestration/apache-flink-dashboard.png"::: + +## Example code + + This is an example of orchestrating data pipeline using Airflow with HDInsight on AKS + + The example is based on wordcount example provided on [Apache Flink](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/examples/) + + The DAG expects to have setup for Service Principal for the OAuth Client credential and pass following input configuration for the execution + ```JSON + { + 'jarName':'WordCount.jar', + 'jarDirectory':'abfs://filesystem@<storageaccount>.dfs.core.windows.net', + 'subscritpion':'<cluster subscription id>', + 'rg':'<cluster resource group>', + 'poolNm':'<cluster pool name>', + 'clusterNm':'<cluster name>' + } + + ``` + + Refer to the [sample code](https://github.com/Azure-Samples/hdinsight-aks/blob/main/flink/airflow-python-sample-code). + +
hdinsight-aks	Flink Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-overview.md	+ + Title: What is Apache Flink in Azure HDInsight on AKS? (Preview) +description: An introduction to Apache Flink in Azure HDInsight on AKS. ++ Last updated : 08/29/2023++ +# What is Apache Flink in Azure HDInsight on AKS? (Preview) ++ +[Apache Flink](https://flink.apache.org/) is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations and stateful streaming applications at in-memory speed and at any scale. Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster. Therefore, an application can use unlimited amounts of vCPUs, main memory, disk and network IO. Moreover, Flink easily maintains large application state. Its asynchronous and incremental checkpointing algorithm ensures minimal influence on processing latencies while guaranteeing exactly once state consistency. + +Apache Flink is a massively scalable analytics engine for stream processing. + +Some of the key features that Flink offers are: + +- Operations on bounded and unbounded streams +- In memory performance +- Ability for both streaming and batch computations +- Low latency, high throughput operations +- Exactly once processing +- High Availability +- State and fault tolerance +- Fully compatible with Hadoop ecosystem +- Unified SQL APIs for both stream and batch ++ +## Why Apache Flink? + +Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set. FlinkΓÇÖs features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly once consistency guarantees for state. Flink doesn't have a single point of failure. Flink has been proven to scale to thousands of cores and terabytes of application state, delivers high throughput and low latency, and powers some of the worldΓÇÖs most demanding stream processing applications. + +- Fraud detection: Flink can be used to detect fraudulent transactions or activities in real time by applying complex rules and machine learning models on streaming data. +- Anomaly detection: Flink can be used to identify outliers or abnormal patterns in streaming data, such as sensor readings, network traffic, or user behavior. +- Rule-based alerting: Flink can be used to trigger alerts or notifications based on predefined conditions or thresholds on streaming data, such as temperature, pressure, or stock prices. +- Business process monitoring: Flink can be used to track and analyze the status and performance of business processes or workflows in real time, such as order fulfillment, delivery, or customer service. +- Web application (social network): Flink can be used to power web applications that require real-time processing of user-generated data, such as messages, likes, comments, or recommendations. + +Read more on common use cases described on [Apache Flink Use cases](https://flink.apache.org/use-cases/#use-cases) + +## Apache Flink Cluster Deployment Types +Flink can execute applications in Session mode or Application mode. Currently HDInsight on AKS supports only Session clusters. You can run multiple Flink jobs on a Session cluster. + +## Apache Flink Job Management + +Flink schedules jobs using three distributed components, Job manager, Task manager, and Job Client, which are set in a Leader-Follower pattern. + +Flink Job: A Flink job or program consists of multiple tasks. Tasks are the basic unit of execution in Flink. Each Flink task has multiple instances depending on the level of parallelism and each instance is executed on a TaskManager. + +Job manager: Job manager acts as a scheduler and schedules tasks on task managers. + +Task manager: Task Managers come with one or more slots to execute tasks in parallel. + +Job client: Job client communicates with job manager to submit Flink jobs + +Flink Web UI: Flink features a web UI to inspect, monitor, and debug running applications. ++ +## Checkpoints in Apache Flink + +Every function and operator in Flink can be stateful. Stateful functions store data across the processing of individual elements/events, making state a critical building block for any type of more elaborate operation. In order to make state fault tolerant, Flink needs to checkpoint the state. Checkpoints allow Flink to recover state and positions in the streams to give the application the same semantics as a failure-free execution that means they play an important role for Flink to recover from failure both its state and the corresponding stream positions. + +Checkpointing is enabled in HDInsight on AKS Flink by default. Default settings on HDInsight on AKS maintain the last five checkpoints in persistent storage. In case, your job fails, the job can be restarted from the latest checkpoint. + +## State Backends + +Backends determine where state is stored. Stream processing applications are often stateful, remembering information from processed events and using it to influence further event processing. In Flink, the remembered information, that is, state, is stored locally in the configured state backend. + +When checkpointing is activated, such state is persisted upon checkpoints to guard against data loss and recover consistently. How the state is represented internally, and how and where it's persisted upon checkpoints depends on the chosen State Backend. HDInsight on AKS uses the RocksDB as default StateBackend. + +Supported state backends: + +* HashMapStateBackend +* EmbeddedRocksDBStateBackend + +### The HashMapStateBackend + +The `HashMapStateBackend` holds data internally as objects on the Java heap. Key/value state and window operators hold hash tables that store the values, triggers, etc. + +The HashMapStateBackend is encouraged for: + +* Jobs with large state, long windows, large key/value states. +* All high-availability setups. + +it 's also recommended to set managed memory to zero. This value ensures that the maximum amount of memory is allocated for user code on the JVM. +Unlike `EmbeddedRocksDBStateBackend`, the `HashMapStateBackend` stores data as objects on the heap so that it 's unsafe to reuse objects. + +### The EmbeddedRocksDBStateBackend + +The `EmbeddedRocksDBStateBackend` holds in-flight data in a [RocksDB](http://rocksdb.org) database that is (per default). Unlike storing java objects in `HashMapStateBackend`, data is stored as serialized byte arrays, which mainly define the type serializer, resulting in key comparisons being byte-wise instead of using JavaΓÇÖs `hashCode()` and `equals()` methods. + +By default, we use RocksDb as the state backend. RocksDB is an embeddable persistent key-value store for fast storage. + +``` +state.backend: rocksdb +state.checkpoints.dir: <STORAGE_LOCATION> +``` +By default, HDInsight on AKS stores the checkpoints in the storage account configured by the user, so that the checkpoints are persisted. + +### Incremental Checkpoints + +RocksDB supports Incremental Checkpoints, which can dramatically reduce the checkpointing time in comparison to full checkpoints. Instead of producing a full, self-contained backup of the state backend, incremental checkpoints only record the changes that happened since the latest completed checkpoint. An incremental checkpoint builds upon (typically multiple) previous checkpoints. + +Flink applies RocksDBΓÇÖs internal compaction mechanism in a way that is self-consolidating over time. As a result, the incremental checkpoint history in Flink doesn't grow indefinitely, and old checkpoints are eventually subsumed and pruned automatically. Recovery time of incremental checkpoints may be longer or shorter compared to full checkpoints. If your network bandwidth is the bottleneck, it may take a bit longer to restore from an incremental checkpoint, because it implies fetching more data (more deltas). + +Restore from an incremental checkpoint is faster, if the bottleneck is your CPU or IOPs, because restore from an incremental checkpoint means not to rebuild the local RocksDB tables from FlinkΓÇÖs canonical key value snapshot format (used in savepoints and full checkpoints). + +While we encourage the use of incremental checkpoints for large state, you need to enable this feature manually: + +* Setting a default in your `flink-conf.yaml: state.backend.incremental: true` enables incremental checkpoints, unless the application overrides this setting in the code. This statement is true by default. +* You can alternatively configure this value directly in the code (overrides the config default): + +``` +EmbeddedRocksDBStateBackend` backend = new `EmbeddedRocksDBStateBackend(true); +``` + +By default, we preserve the last five checkpoints in the checkpoint dir configured. + +This value can be changed by changing the following config" + +`state.checkpoints.num-retained: 5` + +## Windowing in Flink + +Windowing is a key feature in stream processing systems such as Apache Flink. Windowing splits the continuous stream into finite batches on which computations can be performed. In Flink, windowing can be done on the entire steam or per-key basis. + +Windowing refers to the process of dividing a stream of events into finite, nonoverlapping segments called windows. This feature allows users to perform computations on specific subsets of data based on time or key-based criteria. + +Windows allow users to split the streamed data into segments that can be processed. Due to the unbounded nature of data streams, there's no situation where all the data is available, because users would be waiting indefinitely for new data points to arrive - so instead, windowing offers a way to define a subset of data points that you can then process and analyze. The trigger defines when the window is considered ready for processing, and the function set for the window specifies how to process the data. + +Learn [more](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/windows/) + +### Reference + +[Apache Flink](https://flink.apache.org/)
hdinsight-aks	Flink Table Api And Sql	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-table-api-and-sql.md	+ + Title: Table API and SQL - HDInsight on AKS - Apache Flink +description: Learn about Table API and SQL in HDInsight on AKS - Apache Flink ++ Last updated : 08/29/2023++ +# Table API and SQL in HDInsight on AKS - Apache Flink ++ +Apache Flink features two relational APIs - the Table API and SQL - for unified stream and batch processing. The Table API is a language-integrated query API that allows the composition of queries from relational operators such as selection, filter, and join intuitively. FlinkΓÇÖs SQL support is based on Apache Calcite, which implements the SQL standard. + +The Table API and SQL interfaces integrate seamlessly with each other and FlinkΓÇÖs DataStream API. You can easily switch between all APIs and libraries, which build upon them. + +## Apache Flink SQL + +Like other SQL engines, Flink queries operate on top of tables. It differs from a traditional database because Flink doesn't manage data at rest locally; instead, its queries operate continuously over external tables. + +Flink data processing pipelines begin with source tables and end with sink tables. Source tables produce rows operated over during the query execution; they're the tables referenced in the FROM clause of a query. Connectors can be of type HDInsight Kafka, HDInsight HBase, Azure Event Hubs, databases, filesystems, or any other system whose connector lies in the classpath. + +## Using SQL Client in HDInsight on AKS - Flink + +You can refer this article on how to use CLI from [Secure Shell](./flink-web-ssh-on-portal-to-flink-sql.md) on Azure portal. Here are some quick samples of how to get started. + +- To start the SQL client + + ``` + ./bin/sql-client.sh + ``` +- To pass an initialization sql file to run along with sql-client + + ``` + ./sql-client.sh -i /path/to/init_file.sql + ``` + +- To set a configuration in sql-client + + ``` + SET execution.runtime-mode = streaming; + SET sql-client.execution.result-mode = table; + SET sql-client.execution.max-table-result.rows = 10000; + ``` + +## SQL DDL + +Flink SQL supports the following CREATE statements + +- CREATE TABLE +- CREATE DATABASE +- CREATE CATALOG + +Following is an example syntax to define a source table using jdbc connector to connect to MSSQL, with id, name as columns in a CREATE TABLE Statement + +```sql +CREATE TABLE student_information ( + id BIGINT, + name STRING, + address STRING, + grade STRING, + PRIMARY KEY (id) NOT ENFORCED + ) WITH ( + 'connector' = 'jdbc', + 'url' = 'jdbc:sqlserver://servername.database.windows.net;database=dbname;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30', + 'table-name' = 'students', + 'username' = 'username', + 'password' = 'password' + ); +``` + +CREATE DATABASE : +```sql +CREATE DATABASE students; +``` +CREATE CATALOG: +```sql +CREATE CATALOG myhive WITH ('type'='hive'); +``` +You can run Continuous Queries on Top of these tables +```sql + SELECT id, + COUNT() as student_count + FROM student_information + GROUP BY grade; +``` +Write out to Sink Table* from Source Table: +```sql + INSERT INTO grade_counts + SELECT id, + COUNT() as student_count + FROM student_information + GROUP BY grade; +``` + +## Adding Dependencies for Apache Flink SQL + +JAR statements are used to add user jars into the classpath or remove user jars from the classpath or show added jars in the classpath in the runtime. + +Flink SQL supports the following JAR statements: + +- ADD JAR +- SHOW JARS +- REMOVE JAR +```sql +Flink SQL> ADD JAR '/path/hello.jar'; +[INFO] Execute statement succeed. + +Flink SQL> ADD JAR 'hdfs:///udf/common-udf.jar'; +[INFO] Execute statement succeed. + +Flink SQL> SHOW JARS; ++-+ +\| jars \| ++-+ +\| /path/hello.jar \| +\| hdfs:///udf/common-udf.jar \| ++-++ +Flink SQL> REMOVE JAR '/path/hello.jar'; +[INFO] The specified jar is removed from session classloader. +``` + +## Hive Metastore in HDInsight on AKS - Flink + +Catalogs provide metadata, such as databases, tables, partitions, views, and functions and information needed to access data stored in a database or other external systems. + +In HDInsight on AKS, Flink we support two catalog options: + +GenericInMemoryCatalog* + +The GenericInMemoryCatalog is an in-memory implementation of a catalog. All the objects are available only for the lifetime of the sql session. + +HiveCatalog + +The HiveCatalog serves two purposes; as persistent storage for pure Flink metadata, and as an interface for reading and writing existing Hive metadata. + +> [!NOTE] +> In HDInsight on AKS, Flink comes with an integrated option of Hive Metastore. You can opt for Hive Metastore during [cluster creation](../flink/flink-create-cluster-portal.md) + +## How to Create and Register Flink Databases to Catalogs + +You can refer this article on how to use CLI and get started with Flink SQL Client from [Secure Shell](./flink-web-ssh-on-portal-to-flink-sql.md) on Azure portal. + +- Start `sql-client.sh` session + + :::image type="content" source="./media/flink-table-sql-api/default-catalog.png" alt-text="Screenshot showing default hive catalog."::: + + Default_catalog is the default in-memory catalog +- Let us now check default database of in-memory catalog + :::image type="content" source="./media/flink-table-sql-api/default-database-in-memory-catalogs.png" alt-text="Screenshot showing default in-memory catalogs."::: +- Let us create Hive Catalog of version 3.1.2 and use it + + ```sql + CREATE CATALOG myhive WITH ('type'='hive'); + USE CATALOG myhive; + ``` + > [!NOTE] + > HDInsight on AKS Flink supports Hive 3.1.2 and Hadoop 3.3.2. The `hive-conf-dir` is set to location `/opt/hive-conf` + +- Let us create Database in hive catalog and make it default for the session (unless changed). + :::image type="content" source="./media/flink-table-sql-api/create-default-hive-catalog.png" alt-text="Screenshot showing creating database in hive catalog and making it default catalog for the session."::: + +## How to Create and Register Hive Tables to Hive Catalog + +- Follow the instructions on [How to Create and Register Flink Databases to Catalog](#how-to-create-and-register-flink-databases-to-catalogs) +- Let us create Flink Table of connector type Hive without Partition + + ```sql + CREATE TABLE hive_table(x int, days STRING) WITH ( 'connector' = 'hive', 'sink.partition-commit.delay'='1 s', 'sink.partition-commit.policy.kind'='metastore,success-file'); + ``` +- Insert Data into hive_table + ```sql + INSERT INTO hive_table SELECT 2, '10'; + INSERT INTO hive_table SELECT 3, '20'; + ``` +- Read data from hive_table + ```sql + Flink SQL> SELECT * FROM hive_table; + 2023-07-24 09:46:22,225 INFO org.apache.hadoop.mapred.FileInputFormat[] - Total input files to process : 3 + +-+-+--+ + \| op \| x \| days \| + +-+-+--+ + \| +I \| 3 \| 20 \| + \| +I \| 2 \| 10 \| + \| +I \| 1 \| 5 \| + +-+-+--+ + Received a total of 3 rows + ``` + > [!NOTE] + > Hive Warehouse Directory is located in the designated container of storage account chosen during Apache Flink cluster creation, can be found at directory hive/warehouse/ +- Lets create Flink Table of connector type hive with Partition + ```sql + CREATE TABLE partitioned_hive_table(x int, days STRING) PARTITIONED BY (days) WITH ( 'connector' = 'hive', 'sink.partition-commit.delay'='1 s', 'sink.partition-commit.policy.kind'='metastore,success-file'); + ``` +> [!IMPORTANT] +> There is a known limitation in Flink. The last ΓÇÿnΓÇÖ columns are chosen for partitions, irrespective of user defined partition column. [FLINK-32596](https://issues.apache.org/jira/browse/FLINK-32596) The partition key will be wrong when use Flink dialect to create Hive table.
hdinsight-aks	Flink Web Ssh On Portal To Flink Sql	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/flink-web-ssh-on-portal-to-flink-sql.md	+ + Title: How to enter the HDInsight on AKS Flink CLI client using Secure Shell (SSH) on Azure portal +description: How to enter the HDInsight on AKS Flink SQL & DStream CLI client using webssh on Azure portal ++ Last updated : 08/29/2023++ +# Access CLI Client using Secure Shell (SSH) on Azure portal ++ +This example guides how to enter the HDInsight on AKS Flink CLI client using SSH on Azure portal, we cover both Flink SQL and Flink DataStream + +## Prerequisites +- You're required to select SSH during [creation](./flink-create-cluster-portal.md) of Flink Cluster + +## Connecting to the SSH from Azure portal + +Once the Flink cluster is created, you can observe on the left pane the Settings option to access Secure Shell ++ +## Apache Flink SQL + +#### Connecting to SQL Client + +You're required to change directory to `/opt/flink-webssh/bin` and then execute `./sql-client.sh` +++ +You're now on SQL Client on Flink + +Refer to [this](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sqlclient/) document to perform few more tests. + +## Apache Flink DataStream + +Flink provides a Command-Line Interface (CLI)ΓÇ» `bin/flink` to run programs that are packaged as JAR files and to control their execution. + +The CLI is part Secure Shell (SSH), and it connects to the running JobManager and use the client configurations specified atΓÇ»`conf/flink-conf.yaml`. + +Submitting a job means to upload the jobΓÇÖs JAR to the SSH pod and initiating the job execution. To illustrate an example for this article, we select a long-running job likeΓÇ»`examples/streaming/StateMachineExample.jar`. + +> [!NOTE] +> For managing dependencies, the expectation is to build and submit a fat jar for the job. + +- Upload the fat job jar from ABFS to webssh. +- Based on your use case, youΓÇÖre required to edit the client configurations using [Flink configuration management](../flink/flink-configuration-management.md) under flink-client-configs. + +- Let us run StateMachineExample.jar + + ``` + ./bin/flink run \ + --detached \ + ./examples/streaming/StateMachineExample.jar + ``` +> [!NOTE] +> Submitting the job using `--detached` will make the command return after the submission is done. The output contains the ID of the newly submitted job. + +## Reference + +* [Flink SQL Client](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sqlclient/)
hdinsight-aks	Fraud Detection Flink Datastream Api	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/fraud-detection-flink-datastream-api.md	+ + Title: Fraud detection with the Apache Flink DataStream API +description: Learn about Fraud detection with the Apache Flink DataStream API ++ Last updated : 08/29/2023++ +# Fraud detection with the Apache Flink DataStream API ++ +In this article, learn how to run Fraud detection use case with the Apache Flink DataStream API. + +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* IntelliJ Idea community edition installed locally + +## Develop code in IDE + +- For the sample job, refer [Fraud Detection with the DataStream API](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/try-flink/datastream/) +- Build the skeleton of the code using Flink Maven Archetype by using InterlliJ Idea IDE. +- Once the IDE is opened, go to File -> New -> Project -> Maven Archetype. +- Enter the details as shown in the image. + + :::image type="content" source="./media/fraud-detection-flink-datastream-api/maven-archetype.png" alt-text="Screenshot showing Maven Archetype." border="true" lightbox="./media/fraud-detection-flink-datastream-api/maven-archetype.png"::: + +- After you create the Maven Archetype, it generates 2 java classes FraudDetectionJob and FraudDetector. +- Update the `FraudDetector` with the following code. + + ``` + package spendreport; + + import org.apache.flink.api.common.state.ValueState; + import org.apache.flink.api.common.state.ValueStateDescriptor; + import org.apache.flink.api.common.typeinfo.Types; + import org.apache.flink.configuration.Configuration; + import org.apache.flink.streaming.api.functions.KeyedProcessFunction; + import org.apache.flink.util.Collector; + import org.apache.flink.walkthrough.common.entity.Alert; + import org.apache.flink.walkthrough.common.entity.Transaction; + + public class FraudDetector extends KeyedProcessFunction<Long, Transaction, Alert> { + + private static final long serialVersionUID = 1L; + + private static final double SMALL_AMOUNT = 1.00; + private static final double LARGE_AMOUNT = 500.00; + private static final long ONE_MINUTE = 60 * 1000; + + private transient ValueState<Boolean> flagState; + private transient ValueState<Long> timerState; + + @Override + public void open(Configuration parameters) { + ValueStateDescriptor<Boolean> flagDescriptor = new ValueStateDescriptor<>( + "flag", + Types.BOOLEAN); + flagState = getRuntimeContext().getState(flagDescriptor); + + ValueStateDescriptor<Long> timerDescriptor = new ValueStateDescriptor<>( + "timer-state", + Types.LONG); + timerState = getRuntimeContext().getState(timerDescriptor); + } + + @Override + public void processElement( + Transaction transaction, + Context context, + Collector<Alert> collector) throws Exception { + + // Get the current state for the current key + Boolean lastTransactionWasSmall = flagState.value(); + + // Check if the flag is set + if (lastTransactionWasSmall != null) { + if (transaction.getAmount() > LARGE_AMOUNT) { + //Output an alert downstream + Alert alert = new Alert(); + alert.setId(transaction.getAccountId()); + + collector.collect(alert); + } + // Clean up our state + cleanUp(context); + } + + if (transaction.getAmount() < SMALL_AMOUNT) { + // set the flag to true + flagState.update(true); + + long timer = context.timerService().currentProcessingTime() + ONE_MINUTE; + context.timerService().registerProcessingTimeTimer(timer); + + timerState.update(timer); + } + } + + @Override + public void onTimer(long timestamp, OnTimerContext ctx, Collector<Alert> out) { + // remove flag after 1 minute + timerState.clear(); + flagState.clear(); + } + + private void cleanUp(Context ctx) throws Exception { + // delete timer + Long timer = timerState.value(); + ctx.timerService().deleteProcessingTimeTimer(timer); + + // clean up all state + timerState.clear(); + flagState.clear(); + } + } + + ``` + +This job uses a source that generates an infinite stream of credit card transactions for you to process. Each transaction contains an account ID (accountId), timestamp (timestamp) of when the transaction occurred, and US$ amount (amount). The logic is that if transaction of the small amount (< 1.00) immediately followed by a large amount (> 500) it sets off alarm and updates the output logs. It uses data from TransactionIterator following class, which is hardcoded so that account ID 3 is detected as fraudulent transaction. + +For more information, refer [Sample TransactionIterator.java](https://github.com/apache/flink/blob/master/flink-walkthroughs/flink-walkthrough-common/src/main/java/org/apache/flink/walkthrough/common/source/TransactionIterator.java) + +## Create JAR file + +After making the code changes, create the jar using the following steps in IntelliJ Idea IDE + +- Go to File -> Project Structure -> Project Settings -> Artifacts +- Click + (plus sign) -> Jar -> From modules with dependencies. +- Select a Main Class (the one with main() method) if you need to make the jar runnable. +- Select Extract to the target Jar. +- Click OK. +- Click Apply and then OK. +- The following step sets the "skeleton" to where the jar will be saved to. + :::image type="content" source="./media/fraud-detection-flink-datastream-api/extract-target-jar.png" alt-text="Screenshot showing how to extract target Jar." border="true" lightbox="./media/fraud-detection-flink-datastream-api/extract-target-jar.png"::: + +- To build and save + - Go to Build -> Build Artifact -> Build + + :::image type="content" source="./media/fraud-detection-flink-datastream-api/build-artifact.png" alt-text="Screenshot showing how to build artifact."::: + + :::image type="content" source="./media/fraud-detection-flink-datastream-api/extract-target-jar-1.png" alt-text="Screenshot showing how to extract the target jar."::: + +## Run the job in Apache Flink environment + +- Once the jar is generated, it can be used to submit the job from Flink UI using submit job section. + + +- After the job is submitted, it's moved to running state, and the Task manager logs will be generated. +++ +- From the logs, view the alert is generated for Account ID 3. + +## Reference +* [Fraud Detector v2: State + Time](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/try-flink/datastream/#fraud-detector-v2-state--time--1008465039) +* [Sample TransactionIterator.java](https://github.com/apache/flink/blob/master/flink-walkthroughs/flink-walkthrough-common/src/main/java/org/apache/flink/walkthrough/common/source/TransactionIterator.java)
hdinsight-aks	Hive Dialect Flink	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/hive-dialect-flink.md	+ + Title: Hive dialect in Flink +description: Hive dialect in Flink HDInsight on AKS ++ Last updated : 09/18/2023++ +# Hive dialect in Flink ++ +In this article, learn how to use Hive dialect in HDInsight on AKS - Flink. + +## Introduction + +The user cannot change the default `flink` dialect to hive dialect for their usage on HDInsight on AKS - Flink. All the SQL operations fail once changed to hive dialect with the following error. + +```Caused by: + +java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader +``` + +The reason for this issue arises due to an open [Hive Jira](https://issues.apache.org/jira/browse/HIVE-21584). Currently, Hive assumes that the system class loader is an instance of URLClassLoader. In `Java 11`, this assumption is not the case. + +## How to use Hive dialect in Flink + +- Execute the following steps in [webssh](./flink-web-ssh-on-portal-to-flink-sql.md): + + 1. Remove the existing flink-sql-connector-hivejar in lib location + ```command + rm /opt/flink-webssh/lib/flink-sql-connector-hivejar + ``` + 1. Download the below jar in `webssh` pod and add it under the /opt/flink-webssh/lib wget https://aka.ms/hdiflinkhivejdk11jar. + (The above hive jar has the fix [https://issues.apache.org/jira/browse/HIVE-27508](https://issues.apache.org/jira/browse/HIVE-27508)) + + 1. ``` + mv $FLINK_HOME/opt/flink-table-planner_2.12-1.16.0-0.0.18.jar $FLINK_HOME/lib/flink-table-planner_2.12-1.16.0-0.0.18.jar + ``` + + 1. ``` + mv $FLINK_HOME/lib/flink-table-planner-loader-1.16.0-0.0.18.jar $FLINK_HOME/opt/flink-table-planner-loader-1.16.0-0.0.18.jar + ``` + + 1. Add the following keys in the `flink` configuration management under core-site.xml section: + ``` + fs.azure.account.key.<STORAGE>.dfs.core.windows.net: <KEY> + flink.hadoop.fs.azure.account.key.<STORAGE>.dfs.core.windows.net: <KEY> + ``` + +- Here is an overview of [hive-dialect queries](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hive-dialect/queries/overview/) + + - Executing Hive dialect in Flink without partitioning + + ```sql + root [ ~ ]# ./bin/sql-client.sh + Flink SQL> + Flink SQL> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/opt/hive-conf'); + [INFO] Execute statement succeed. + + Flink SQL> use catalog myhive; + [INFO] Execute statement succeed. + + Flink SQL> load module hive; + [INFO] Execute statement succeed. + + Flink SQL> use modules hive,core; + [INFO] Execute statement succeed. + + Flink SQL> set table.sql-dialect=hive; + [INFO] Session property has been set. + + Flink SQL> set sql-client.execution.result-mode=tableau; + [INFO] Session property has been set. + + Flink SQL> select explode(array(1,2,3));Hive Session ID = 6ba45be2-360e-4bee-8842-2765c91581c8 + + + > [!WARNING] + > An illegal reflective access operation has occurred + + > [!WARNING] + > Illegal reflective access by org.apache.hadoop.hive.common.StringInternUtils (file:/opt/flink-webssh/lib/flink-sql-connector-hive-3.1.2_2.12-1.16-SNAPSHOT.jar) to field java.net.URI.string + + > [!WARNING] + > Please consider reporting this to the maintainers of org.apache.hadoop.hive.common.StringInternUtils + + > [!WARNING] + > `Use --illegal-access=warn` to enable warnings of further illegal reflective access operations + + > [!WARNING] + > All illegal access operations will be denied in a future release + select explode(array(1,2,3)); ++ + +-+-+ + \| op \| col \| + +-+-+ + \| +I \| 1 \| + \| +I \| 2 \| + \| +I \| 3 \| + +-+-+ + + Received a total of 3 rows + + Flink SQL> create table tttestHive Session ID = fb8b652a-8dad-4781-8384-0694dc16e837 + + [INFO] Execute statement succeed. + + Flink SQL> insert into table tttestHive Session ID = f239dc6f-4b58-49f9-ad02-4c73673737d8),(3,'c'),(4,'d'); + + [INFO] Submitting SQL update statement to the cluster... + [INFO] SQL update statement has been successfully submitted to the cluster: + Job ID: d0542da4c4252f9494298666ff4e9f8e + + Flink SQL> set execution.runtime-mode=batch; + [INFO] Session property has been set. + + Flink SQL> select * from tttestHive Session ID = 61b6eb3b-90a6-499c-aced-0598366c5b31 + + +--+-+ + \| key \| value \| + +--+-+ + \| 1 \| a \| + \| 1 \| a \| + \| 2 \| b \| + \| 3 \| c \| + \| 3 \| c \| + \| 3 \| c \| + \| 4 \| d \| + \| 5 \| e \| + +--+-+ + 8 rows in set + + Flink SQL> QUIT;Hive Session ID = 2dadad92-436e-426e-a88c-66eafd740d98 + + [INFO] Exiting Flink SQL CLI Client... + + Shutting down the session... + done. + root [ ~ ]# exit + ``` + + The data is written in the same container configured in the hive/warehouse directory. + + :::image type="content" source="./media/hive-dialect-flink/flink-container-table-1.png" alt-text="Screenshot shows container table 1." lightbox="./media/hive-dialect-flink/flink-container-table-1.png"::: + + - Executing Hive dialect in Flink with partitions + +```sql + create table tblpart2 (key int, value string) PARTITIONED by ( part string ) tblproperties ('sink.partition-commit.delay'='1 s', 'sink.partition-commit.policy.kind'='metastore,success-file'); + + insert into table tblpart2 Hive Session ID = 78fae85f-a451-4110-bea6-4aa1c172e282),(2,'b','d'),(3,'c','d'),(3,'c','a'),(4,'d','e'); +``` + :::image type="content" source="./media/hive-dialect-flink/flink-container-table-2.png" alt-text="Screenshot shows container table 2." lightbox="./media/hive-dialect-flink/flink-container-table-2.png"::: + + :::image type="content" source="./media/hive-dialect-flink/flink-container-table-3.png" alt-text="Screenshot shows container table 3." lightbox="./media/hive-dialect-flink/flink-container-table-3.png":::
hdinsight-aks	Integration Of Azure Data Explorer	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/integration-of-azure-data-explorer.md	+ + Title: Integration of Azure Data Explorer and Flink +description: Integration of Azure Data Explorer and Flink in HDInsight on AKS ++ Last updated : 09/18/2023++ +# Integration of Azure Data Explorer and Flink + +Azure Data Explorer is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real time. + +ADX helps users in analysis of large volumes of data from streaming applications, websites, IoT devices, etc. Integrating Flink with ADX helps you to process real-time data and analyze it in ADX. + +## Prerequisites +- [Create HDInsight on AKS Flink cluster](./flink-create-cluster-portal.md) +- [Create Azure data explorer](/azure/data-explorer/create-cluster-and-database/) + +## Steps to use Azure Data Explorer as sink in Flink + +1. [Create HDInsight on AKS Flink cluster](./flink-create-cluster-portal.md). + +1. [Create ADX with database](/azure/data-explorer/create-cluster-and-database/) and table as required. + +1. Add ingestor permissions for the managed identity in Kusto. + + ``` + .add database <DATABASE_NAME> ingestors ΓÇ»('aadapp=CLIENT_ID_OF_MANAGED_IDENTITY') + ``` +1. Run a sample program defining the Kusto cluster URI (Uniform Resource Identifier), database and managed identity used, and the table it needs to write to. + +1. Clone the flink-connector-kusto project: https://github.com/Azure/flink-connector-kusto.git + +1. Create the table in ADX using following command + + ```Sample table + .create table CryptoRatesHeartbeatTimeBatch (processing_dttm: datetime, ['type']: string, last_trade_id: string, product_id: string, sequence: long, ['time']: datetime) + ``` + + +1. Update FlinkKustoSinkSample.java file with the right Kusto cluster URI, database and the managed identity used. + + ```JAVA + String database = "sdktests"; //ADX database name + + String msiClientId = ΓÇ£xxxx-xxxx-xxxxΓÇ¥; //Provide the client id of the Managed identity which is linked to the Flink cluster + String cluster = "https://trdp-1665b5eybxs0tbett.z8.kusto.fabric.microsoft.com/"; //Data explorer Cluster URI + KustoConnectionOptions kustoConnectionOptions = KustoConnectionOptions.builder() + .setManagedIdentityAppId(msiClientId).setClusterUrl(cluster).build(); + String defaultTable = "CryptoRatesHeartbeatTimeBatch"; //Table where the data needs to be written + KustoWriteOptions kustoWriteOptionsHeartbeat = KustoWriteOptions.builder() + .withDatabase(database).withTable(defaultTable).withBatchIntervalMs(30000) + ``` + + + Later build the project using ΓÇ£mvn clean packageΓÇ¥ + +1. Locate the JAR file named 'samples-java-1.0-SNAPSHOT-shaded.jar' under the 'sample-java/target' folder, then upload this JAR file in the Flink UI and submit the job. + +1. Query the Kusto table to verify the output + + :::image type="content" source="./media/integration-of-azure-data-explorer/kusto-table-to-verify-output.png" alt-text="screenshot shows query the Kusto table to verify the output." lightbox="./media/integration-of-azure-data-explorer/kusto-table-to-verify-output.png"::: + + There is no delay in writing the data to the Kusto table from Flink. +
hdinsight-aks	Join Stream Kafka Table Filesystem	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/join-stream-kafka-table-filesystem.md	+ + Title: Enrich the events from Kafka with the attributes from FileSystem with Apache Flink +description: Learn how to join stream from Kafka with table from fileSystem using DataStream API ++ Last updated : 08/29/2023++ +# Enrich the events from Kafka with attributes from ADLS Gen2 with Apache Flink ++ +In this article, you can learn how you can enrich the real time events by joining a stream from Kafka with table on ADLS Gen2 using Flink Streaming. We use Flink Streaming API to join events from HDInsight Kafka with attributes from ADLS Gen2, further we use attributes-joined events to sink into another Kafka topic. + +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* [HDInsight Kafka](../../hdinsight/kafk) + * You're required to ensure the network settings are taken care as described on [Using HDInsight Kafka](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS Flink and HDInsight Kafka are in the same VNet +* For this demonstration, we're using a Window VM as maven project develop environment in the same VNet as HDInsight on AKS + +## Kafka topic preparation + +We're creating a topic called `user_events`. + +- The purpose is to read a stream of real-time events from a Kafka topic using Flink. We have every event with the following fields: + ``` + user_id, + item_id, + type, + timestamp, + ``` + +Kafka 2.4.1 +``` +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic user_events --zookeeper zk0-contos:2181 +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic user_events_output --zookeeper zk0-contos:2181 +``` + +Kafka 3.2.0 +``` +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic user_events --bootstrap-server wn0-contsk:9092 +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic user_events_output --bootstrap-server wn0-contsk:9092 +``` + +## Prepare file on ADLS Gen2 + +We are creating a file called `item attributes` in our storage + +- The purpose is to read a batch of `item attributes` from a file on ADLS Gen2. Each item has the following fields: + ``` + item_id, + brand, + category, + timestamp, + ``` ++ +## Develop the Apache Flink job + +In this step we perform the following activities +- Enrich the `user_events` topic from Kafka by joining with `item attributes` from a file on ADLS Gen2. +- We push the outcome of this step, as an enriched user activity of events into a Kafka topic. + +### Develop Maven project + +pom.xml + +``` xml +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <groupId>contoso.example</groupId> + <artifactId>FlinkKafkaJoinGen2</artifactId> + <version>1.0-SNAPSHOT</version> + + <properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <scala.binary.version>2.12</scala.binary.version> + <kafka.version>3.2.0</kafka.version> //replace with 2.4.1 if you are using HDInsight Kafka 2.4.1 + </properties> + <dependencies> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> + </dependency> + </dependencies> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-assembly-plugin</artifactId> + <version>3.0.0</version> + <configuration> + <appendAssemblyId>false</appendAssemblyId> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> +``` + +Join the Kafka topic with ADLS Gen2 File + +KafkaJoinGen2Demo.java + +``` java +package contoso.example; + +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.functions.RichMapFunction; +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.api.java.tuple.Tuple4; +import org.apache.flink.api.java.tuple.Tuple7; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema; +import org.apache.flink.connector.kafka.sink.KafkaSink; +import org.apache.flink.connector.kafka.source.KafkaSource; +import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + +import java.io.BufferedReader; +import java.io.FileReader; +import java.util.HashMap; +import java.util.Map; + +public class KafkaJoinGen2Demo { + public static void main(String[] args) throws Exception { + // 1. Set up the stream execution environment + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + // Kafka source configuration, update with your broker IPs + String brokers = "<broker-ip>:9092,<broker-ip>:9092,<broker-ip>:9092"; + String inputTopic = "user_events"; + String outputTopic = "user_events_output"; + String groupId = "my_group"; + + // 2. Register the cached file, update your container name and storage name + env.registerCachedFile("abfs://<container-name>@<storagename>.dfs.core.windows.net/flink/data/item.txt", "file1"); + + // 3. Read a stream of real-time user behavior event from a Kafka topic + KafkaSource<String> kafkaSource = KafkaSource.<String>builder() + .setBootstrapServers(brokers) + .setTopics(inputTopic) + .setGroupId(groupId) + .setStartingOffsets(OffsetsInitializer.earliest()) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); + + DataStream<String> kafkaData = env.fromSource(kafkaSource, WatermarkStrategy.noWatermarks(), "Kafka Source"); + + // Parse Kafka source data + DataStream<Tuple4<String, String, String, String>> userEvents = kafkaData.map(new MapFunction<String, Tuple4<String, String, String, String>>() { + @Override + public Tuple4<String, String, String, String> map(String value) throws Exception { + // Parse the line into a Tuple4 + String[] parts = value.split(","); + return new Tuple4<>(parts[0], parts[1], parts[2], parts[3]); + } + }); + + // 4. Enrich the user activity events by joining the items' attributes from a file + DataStream<Tuple7<String,String,String,String,String,String,String>> enrichedData = userEvents.map(new MyJoinFunction()); + + // 5. Output the enriched user activity events to a Kafka topic + KafkaSink<String> sink = KafkaSink.<String>builder() + .setBootstrapServers(brokers) + .setRecordSerializer(KafkaRecordSerializationSchema.builder() + .setTopic(outputTopic) + .setValueSerializationSchema(new SimpleStringSchema()) + .build() + ) + .build(); + + enrichedData.map(value -> value.toString()).sinkTo(sink); + + // 6. Execute the Flink job + env.execute("Kafka Join Batch gen2 file, sink to another Kafka Topic"); + } + + private static class MyJoinFunction extends RichMapFunction<Tuple4<String,String,String,String>, Tuple7<String,String,String,String,String,String,String>> { + private Map<String, Tuple4<String, String, String, String>> itemAttributes; + + @Override + public void open(Configuration parameters) throws Exception { + super.open(parameters); + + // Read the cached file and parse its contents into a map + itemAttributes = new HashMap<>(); + try (BufferedReader reader = new BufferedReader(new FileReader(getRuntimeContext().getDistributedCache().getFile("file1")))) { + String line; + while ((line = reader.readLine()) != null) { + String[] parts = line.split(","); + itemAttributes.put(parts[0], new Tuple4<>(parts[0], parts[1], parts[2], parts[3])); + } + } + } + + @Override + public Tuple7<String,String,String,String,String,String,String> map(Tuple4<String,String,String,String> value) throws Exception { + Tuple4<String, String, String, String> broadcastValue = itemAttributes.get(value.f1); + if (broadcastValue != null) { + return Tuple7.of(value.f0,value.f1,value.f2,value.f3,broadcastValue.f1,broadcastValue.f2,broadcastValue.f3); + } else { + return null; + } + } + } +} +``` + +## Package jar and submit to Apache Flink + +We're submitting the packaged jar to Flink: ++++ +### Produce real-time `user_events` topic on Kafka + + We are able to produce real-time user behavior event `user_events` in Kafka. ++ +### Consume the `itemAttributes` joining with `user_events` on Kafka + +We are now using `itemAttributes` on filesystem join user activity events `user_events`. ++ +We continue to produce and consume the user activity and item attributes in the following images +++ +## Reference + +[Flink Examples](https://github.com/flink-extended/)
hdinsight-aks	Monitor Changes Postgres Table Flink	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/monitor-changes-postgres-table-flink.md	+ + Title: Change Data Capture (CDC) of PostgreSQL table using Apache FlinkSQL +description: Learn how to perform CDC on PostgreSQL table using Apache FlinkSQL CDC ++ Last updated : 08/29/2023++ +# Change Data Capture (CDC) of PostgreSQL table using Apache FlinkSQL ++ +Change Data Capture (CDC) is a technique you can use to track row-level changes in database tables in response to create, update, and delete operations. In this article, we use [CDC Connectors for Apache Flink┬«](https://github.com/ververica/flink-cdc-connectors), which offer a set of source connectors for Apache Flink. The connectors integrate [Debezium┬«](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/debezium/#debezium-format) as the engine to capture the data changes. + +Flink supports to interpret Debezium JSON and Avro messages as INSERT/UPDATE/DELETE messages into Apache Flink SQL system. + +This support is useful in many cases to: + +- Synchronize incremental data from databases to other systems +- Audit logs +- Build real-time materialized views on databases +- View temporal join changing history of a database table ++ +Now, let's learn how to monitor changes on PostgreSQL table using Flink-SQL CDC. The PostgreSQL CDC connector allows for reading snapshot data and incremental data from PostgreSQL database. + +## Prerequisites + +* [Azure PostgresSQL flexible server Version 14.7](/azure/postgresql/flexible-server/overview) +* [HDInsight on AKS Flink 1.16.0](./flink-create-cluster-portal.md) +* Linux virtual Machine to use PostgreSQL client +* Add the NSG rule that allows inbound and outbound connections on port 5432 in HDInsight on AKS pool subnet. + +## Prepare PostgreSQL table & Client + +- Using a Linux virtual machine, install PostgreSQL client using below commands + + ``` + sudo apt-get update + sudo apt-get install postgresql-client + ``` + +- Install the certificate to connect to PostgreSQL server using SSL + + `wget --no-check-certificate https://dl.cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem` + +- Connect to the server (replace host, username and database name accordingly) + + ``` + psql --host=flinkpostgres.postgres.database.azure.com --port=5432 --username=admin --dbname=postgres --set=sslmode=require --set=sslrootcert=DigiCertGlobalRootCA.crt.pem + ``` +- After connecting to the database successfully, create a sample table + ``` + CREATE TABLE shipments ( + shipment_id SERIAL NOT NULL PRIMARY KEY, + order_id SERIAL NOT NULL, + origin VARCHAR(255) NOT NULL, + destination VARCHAR(255) NOT NULL, + is_arrived BOOLEAN NOT NULL + ); + ALTER SEQUENCE public.shipments_shipment_id_seq RESTART WITH 1001; + ALTER TABLE public.shipments REPLICA IDENTITY FULL; + INSERT INTO shipments + VALUES (default,10001,'Beijing','Shanghai',false), + (default,10002,'Hangzhou','Shanghai',false), + (default,10003,'Shanghai','Hangzhou',false); + ``` + +- To enable CDC on PostgreSQL database, you're required to make the following changes. + + - WAL level must be changed to logical. This value can be changed in server parameters section on Azure portal. + + :::image type="content" source="./media/monitor-changes-postgres-table-flink/enable-cdc-on-postgres-database.png" alt-text="Screenshot showing how to enable-cdc-on-postgres-database." border="true" lightbox="./media/monitor-changes-postgres-table-flink/enable-cdc-on-postgres-database.png"::: + + - User accessing the table must have 'REPLICATION' role added + + ALTER USER `<username>` WITH REPLICATION; + +## Create Apache Flink PostgreSQL CDC table + +- To create Flink PostgreSQL CDC table, download all the dependent jars. Use the `pom.xml` file with the following contents. + + ```xml + <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + <groupId>com.dep.download</groupId> + <artifactId>dep-download</artifactId> + <version>1.0-SNAPSHOT</version> + <!-- https://mvnrepository.com/artifact/com.ververica/flink-sql-connector-sqlserver-cdc --> + <dependencies> + <dependency> + <groupId>com.ververica</groupId> + <artifactId>flink-sql-connector-postgres-cdc</artifactId> + <version>2.3.0</version> + </dependency> + </dependencies> + </project> + ``` +- Use maven command to download all the dependent jars + + ``` + mvn -DoutputDirectory=target -f pom.xml dependency:copy-dependencies -X + ``` + + > [!NOTE] + > * If your web ssh pod does not contain maven please follow the links to download and install it. + > * https://maven.apache.org/download.cgi + > * https://maven.apache.org/install.html + > * In order to download jsr jar file use the following command + > * `wget https://repo1.maven.org/maven2/net/java/loci/jsr308-all/1.1.2/jsr308-all-1.1.2.jar` + +- Once the dependent jars are downloaded start the [Flink SQL client](./flink-web-ssh-on-portal-to-flink-sql.md), with these jars to be imported into the session. Complete command as follows, + + ```sql + /opt/flink-webssh/bin/sql-client.sh -j + /opt/flink-webssh/target/flink-sql-connector-postgres-cdc-2.3.0.jar -j + /opt/flink-webssh/target/slf4j-api-1.7.15.jar -j + /opt/flink-webssh/target/hamcrest-2.1.jar -j + /opt/flink-webssh/target/flink-shaded-guava-30.1.1-jre-16.0.jar -j + /opt/flink-webssh/target/awaitility-4.0.1.jar -j + /opt/flink-webssh/target/jsr308-all-1.1.2.jar + ``` + These commands start the sql client with the dependencies as, + + :::image type="content" source="./media/monitor-changes-postgres-table-flink/start-the-sql-client.png" alt-text="Screenshot showing start-the-sql-client." border="true" lightbox="./media/monitor-changes-postgres-table-flink/start-the-sql-client.png"::: + + :::image type="content" source="./media/monitor-changes-postgres-table-flink/sql-client-status.png" alt-text="Screenshot showing sql-client-status." border="true" lightbox="./media/monitor-changes-postgres-table-flink/sql-client-status.png"::: ++ +- Create a Flink PostgreSQL CDC table using CDC connector + + ``` + CREATE TABLE shipments ( + shipment_id INT, + order_id INT, + origin STRING, + destination STRING, + is_arrived BOOLEAN, + PRIMARY KEY (shipment_id) NOT ENFORCED + ) WITH ( + 'connector' = 'postgres-cdc', + 'hostname' = 'flinkpostgres.postgres.database.azure.com', + 'port' = '5432', + 'username' = 'username', + 'password' = 'admin', + 'database-name' = 'postgres', + 'schema-name' = 'public', + 'table-name' = 'shipments', + 'decoding.plugin.name' = 'pgoutput' + ); + ``` +## Validation + +- Run 'select ' command to monitor the changes. + + `select from shipments;` + + :::image type="content" source="./media/monitor-changes-postgres-table-flink/run-select-command.png" alt-text="Screenshot showing how to run-select-command." border="true" lightbox="./media/monitor-changes-postgres-table-flink/run-select-command.png"::: + +### Reference + +[PostgreSQL CDC Connector](https://ververica.github.io/flink-cdc-connectors/release-2.1/content/connectors/postgres-cdc.html) is licensed under [Apache 2.0 License](https://github.com/ververica/flink-cdc-connectors/blob/master/LICENSE)
hdinsight-aks	Process And Consume Data	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/process-and-consume-data.md	+ + Title: Using HDInsight Kafka with HDInsight on AKS Apache Flink +description: Learn how to use HDInsight Kafka with HDInsight on AKS Apache Flink ++ Last updated : 08/29/2023+ + +# Using HDInsight Kafka with HDInsight on AKS - Apache Flink ++ +A well known use case for Apache Flink is stream analytics. The popular choice by many users to use the data streams, which are ingested using Apache Kafka. Typical installations of Flink and Kafka start with event streams being pushed to Kafka, which can be consumed by Flink jobs. + +This example uses HDInsight on AKS Flink 1.16.0 to process streaming data consuming and producing Kafka topic. + +> [!NOTE] +> FlinkKafkaConsumer is deprecated and will be removed with Flink 1.17, please use KafkaSource instead. +> FlinkKafkaProducer is deprecated and will be removed with Flink 1.15, please use KafkaSink instead. + +## Prerequisites + +* Both Kafka and Flink need to be in the same VNet or there should be vnet-peering between the two clusters. +* [Creation of VNet](../../hdinsight/hdinsight-create-virtual-network.md). +* [Create a Kafka cluster in the same VNet](../../hdinsight/kafk). You can choose Kafka 3.2 or 2.4 on HDInsight based on your current usage. + + :::image type="content" source="./media/process-consume-data/create-kafka-cluster-in-the-same-vnet.png" alt-text="Screenshot showing how to create a Kafka cluster in the same VNet." border="true" lightbox="./media/process-consume-data/create-kafka-cluster-in-the-same-vnet.png"::: + +* Add the VNet details in the virtual network section. +* Create a [HDInsight on AKS Cluster pool](../quickstart-create-cluster.md) with same VNet. +* Create a Flink cluster to the cluster pool created. + +## Apache Flink-Kafka Connector + +Flink provides an [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/) for reading data from and writing data to Kafka topics with exactly once guarantees. + +Maven dependency +``` xml + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>1.16.0</version> + </dependency> +``` + +## Building Kafka Sink + +Kafka sink provides a builder class to construct an instance of a KafkaSink. We use the same to construct our Sink and use it along with HDInsight on AKS Flink + +SinKafkaToKafka.java +``` java +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.connector.base.DeliveryGuarantee; + +import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema; +import org.apache.flink.connector.kafka.sink.KafkaSink; +import org.apache.flink.connector.kafka.source.KafkaSource; +import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; + +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSource; +import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + +public class SinKafkaToKafka { + public static void main(String[] args) throws Exception { + // 1. get stream execution environment + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + // 2. read kafka message as stream input, update your broker IPs below + String brokers = "X.X.X.X:9092,X.X.X.X:9092,X.X.X.X:9092"; + KafkaSource<String> source = KafkaSource.<String>builder() + .setBootstrapServers(brokers) + .setTopics("clicks") + .setGroupId("my-group") + .setStartingOffsets(OffsetsInitializer.earliest()) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); + + DataStream<String> stream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source"); + + // 3. transformation: + // https://www.taobao.com,1000 > + // Event{user: "Tim",url: "https://www.taobao.com",timestamp: 1970-01-01 00:00:01.0} + SingleOutputStreamOperator<String> result = stream.map(new MapFunction<String, String>() { + @Override + public String map(String value) throws Exception { + String[] fields = value.split(","); + return new Event(fields[0].trim(), fields[1].trim(), Long.valueOf(fields[2].trim())).toString(); + } + }); + + // 4. sink click into another kafka events topic + KafkaSink<String> sink = KafkaSink.<String>builder() + .setBootstrapServers(brokers) + .setProperty("transaction.timeout.ms","900000") + .setRecordSerializer(KafkaRecordSerializationSchema.builder() + .setTopic("events") + .setValueSerializationSchema(new SimpleStringSchema()) + .build()) + .setDeliveryGuarantee(DeliveryGuarantee.EXACTLY_ONCE) + .build(); + + result.sinkTo(sink); + + // 5. execute the stream + env.execute("kafka Sink to other topic"); + } +} +``` +Writing a Java program Event.java +``` java +import java.sql.Timestamp; + +public class Event { + + public String user; + public String url; + public Long timestamp; + + public Event() { + } + + public Event(String user,String url,Long timestamp) { + this.user = user; + this.url = url; + this.timestamp = timestamp; + } + + @Override + public String toString(){ + return "Event{" + + "user: \"" + user + "\"" + + ",url: \"" + url + "\"" + + ",timestamp: " + new Timestamp(timestamp) + + "}"; + } +} +``` +## Package the jar and submit the job to Flink +++ +## Produce the topic - clicks on Kafka ++ +## Consume the topic - events on Kafka ++ +## Reference + +* [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka)
hdinsight-aks	Sink Kafka To Kibana	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/sink-kafka-to-kibana.md	+ + Title: Use Elasticsearch along with HDInsight on AKS - Apache Flink +description: Learn how to use Elasticsearch along HDInsight on AKS - Apache Flink ++ Last updated : 08/29/2023++ +# Using Elasticsearch with HDInsight on AKS - Apache Flink ++ +Flink for real-time analytics can be used to build a dashboard application that visualizes the streaming data using Elasticsearch and Kibana. + +Flink can be used to analyze a stream of taxi ride events and compute metrics. Metrics can include number of rides per hour, the average fare per ride, or the most popular pickup locations. You can write these metrics to an Elasticsearch index using a Flink sink and use Kibana to connect and create charts or dashboards to display metrics in real-time. + +In this article, learn how to Use Elastic along HDInsight Flink. + +## Elasticsearch and Kibana + +Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including +* Textual +* Numerical +* Geospatial +* Structured +* Unstructured. + +Kibana is a free and open frontend application that sits on top of the elastic stack, providing search and data visualization capabilities for data indexed in Elasticsearch. + +For more information, refer +* [Elasticsearch](https://www.elastic.co) +* [Kibana](https://www.elastic.co/what-is/kibana) ++ +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](./flink-create-cluster-portal.md) +* Elasticsearch-7.13.2 +* Kibana-7.13.2 +* [HDInsight 5.0 - Kafka 2.4.1](../../hdinsight/kafk) +* IntelliJ IDEA for development on an Azure VM which in the same Vnet ++ +### How to Install Elasticsearch on Ubuntu 20.04 + +- APT Update & Install OpenJDK +- Add Elastic Search GPG key and Repository + - Steps for adding the GPG key + ``` + sudo apt-get install apt-transport-https + wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch \| sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg + ``` + - Add Repository + ``` + echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" \| sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list + ``` +- Run system update +``` +sudo apt update +``` + +- Install ElasticSearch on Ubuntu 20.04 Linux +``` +sudo apt install elasticsearch +``` +- Start ElasticSearch Services + + - Reload Daemon: + ``` + sudo systemctl daemon-reload + ``` + - Enable + ``` + sudo systemctl enable elasticsearch + ``` + - Start + ``` + sudo systemctl start elasticsearch + ``` + - Check Status + ``` + sudo systemctl status elasticsearch + ``` + - Stop + ``` + sudo systemctl stop elasticsearch + ``` + +### How to Install Kibana on Ubuntu 20.04 + +For installing and configuring Kibana Dashboard, we donΓÇÖt need to add any other repository because the packages are available through the already added ElasticSearch. + +We use the following command to install Kibana + +``` +sudo apt install kibana +``` + +- Reload daemon + ``` + sudo systemctl daemon-reload + ``` + - Start and Enable: + ``` + sudo systemctl enable kibana + sudo systemctl start kibana + ``` + - To check the status: + ``` + sudo systemctl status kibana + ``` +### Access the Kibana Dashboard web interface + +In order to make Kibana accessible from output, need to set network.host to 0.0.0.0 + +configure /etc/kibana/kibana.yml on Ubuntu VM + +> [!NOTE] +> 10.0.1.4 is a local private IP, that we have used which can be accessed in maven project develop Windows VM. You're required to make modifications according to your network security requirements. We use the same IP later to demo for performing analytics on Kibana. + +``` +server.host: "0.0.0.0" +server.name: "elasticsearch" +server.port: 5601 +elasticsearch.hosts: ["http://10.0.1.4:9200"] +``` ++ +## Prepare Click Events on HDInsight Kafka + +We use python output as input to produce the streaming data + +``` +sshuser@hn0-contsk:~$ python weblog.py \| /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --bootstrap-server wn0-contsk:9092 --topic click_events +``` +Now, lets check messages in this topic + +``` +sshuser@hn0-contsk:~$ /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server wn0-contsk:9092 --topic click_events +``` +``` +{"userName": "Tim", "visitURL": "https://www.bing.com/new", "ts": "07/31/2023 05:47:12"} +{"userName": "Luke", "visitURL": "https://github.com", "ts": "07/31/2023 05:47:12"} +{"userName": "Zark", "visitURL": "https://github.com", "ts": "07/31/2023 05:47:12"} +{"userName": "Zark", "visitURL": "https://docs.python.org", "ts": "07/31/2023 05:47:12"} +``` ++ +## Creating Kafka Sink to Elastic + +Let us write maven source code on the Windows VM + +Main: kafkaSinkToElastic.java +``` java +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.connector.elasticsearch.sink.Elasticsearch7SinkBuilder; +import org.apache.flink.connector.kafka.source.KafkaSource; +import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; +import org.apache.http.HttpHost; +import org.elasticsearch.action.index.IndexRequest; +import org.elasticsearch.client.Requests; + +import java.util.HashMap; +import java.util.Map; + +public class kafkaSinkToElastic { + public static void main(String[] args) throws Exception { + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1); + + // 1. read kafka message + String kafka_brokers = "<broker1 IP>:9092,<broker2 IP>:9092,<broker3 IP>:9092"; + KafkaSource<String> source = KafkaSource.<String>builder() + .setBootstrapServers(kafka_brokers) + .setTopics("click_events") + .setGroupId("my-group") + .setStartingOffsets(OffsetsInitializer.earliest()) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); + + DataStream<String> kafka = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source"); + + // 2. sink to elasticsearch + kafka.sinkTo( + new Elasticsearch7SinkBuilder<String>() + .setBulkFlushMaxActions(1) + .setHosts(new HttpHost("10.0.1.4", 9200, "http")) + .setEmitter( + (element, context, indexer) -> indexer.add(createIndexRequest(element))) + .build()); + + // 3. execute stream + env.execute("Kafka Sink To Elastic"); + + } + private static IndexRequest createIndexRequest(String element) { + String[] logContent =element.replace("{","").replace("}","").split(","); + Map<String, String> esJson = new HashMap<>(); + esJson.put("username", logContent[0]); + esJson.put("visitURL", logContent[1]); + esJson.put("ts", logContent[2]); + return Requests.indexRequest() + .index("kafka_user_clicks") + .id(element) + .source(esJson); + } +} +``` + +Creating a pom.xml on Maven + +``` xml +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + <groupId>contoso.example</groupId> + <artifactId>FlinkElasticSearch</artifactId> + <version>1.0-SNAPSHOT</version> + <properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <kafka.version>3.2.0</kafka.version> + </properties> + <dependencies> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-core --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-core</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-elasticsearch-base --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-elasticsearch7</artifactId> + <version>${flink.version}</version> + </dependency> + </dependencies> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-assembly-plugin</artifactId> + <version>3.0.0</version> + <configuration> + <appendAssemblyId>false</appendAssemblyId> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> +``` + +Package the jar and submit to Flink to run on WebSSH + +On [Secure Shell for Flink](./flink-web-ssh-on-portal-to-flink-sql.md), you can use the following commands + +``` +msdata@pod-0 [ ~ ]$ ls -l FlinkElasticSearch-1.0-SNAPSHOT.jar +-rw-r-- 1 msdata msdata 114616575 Jul 31 06:09 FlinkElasticSearch-1.0-SNAPSHOT.jar +msdatao@pod-0 [ ~ ]$ bin/flink run -c contoso.example.kafkaSinkToElastic -j FlinkElasticSearch-1.0-SNAPSHOT.jar +Job has been submitted with JobID e0eba72d5143cea53bcf072335a4b1cb +``` +## Start Elasticsearch and Kibana to perform analytics on Kibana + +startup Elasticsearch and Kibana on Ubuntu VM and Using Kibana to Visualize Results + +- Access Kibana at IP, which you have set earlier. +- Configure an index pattern by clicking Stack Management in the left-side toolbar and find Index Patterns, then click Create Index Pattern and enter the full index name kafka_user_clicks to create the index pattern. ++ +- Once the index pattern is set up, you can explore the data in Kibana + - Click "Discover" in the left-side toolbar. + + :::image type="content" source="./media/sink-kafka-to-kibana/kibana-discover.png" alt-text="Screenshot showing how to navigate to discover button." lightbox="./media/sink-kafka-to-kibana/kibana-discover.png"::: + + - Kibana lists the content of the created index with kafka-click-events + + :::image type="content" source="./media/sink-kafka-to-kibana/elastic-discover-kafka-click-events.png" alt-text="Screenshot showing elastic with the created index with the kafka-click-events." lightbox="./media/sink-kafka-to-kibana/elastic-discover-kafka-click-events.png" ::: + +- Let us create a dashboard to display various views. + +- Let's use a Area (area graph), then select the kafka_click_events index and edit the Horizontal axis and Vertical axis to illustrate the events + + +- If we set an auto refresh or click Refresh, the plot is updating real time as we have created a Flink Streaming job +++ +## Validation on Apache Flink Job UI + +You can find the job in running state on your Flink Web UI ++ +## Reference +* [Apache Kafka SQL Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka) +* [Elasticsearch SQL Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/elasticsearch)
hdinsight-aks	Sink Sql Server Table Using Flink Sql	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/sink-sql-server-table-using-flink-sql.md	+ + Title: Change Data Capture (CDC) of SQL Server using Apache Flink SQL +description: Learn how to perform CDC of SQL Server using Apache Flink SQL ++ Last updated : 08/29/2023++ +# Change Data Capture (CDC) of SQL Server using Apache Flink SQL ++ +Change Data Capture (CDC) is a technique you can use to track row-level changes in database tables in response to create, update, and delete operations. In this article, we use [CDC Connectors for Apache Flink┬«](https://github.com/ververica/flink-cdc-connectors), which offer a set of source connectors for Apache Flink. The connectors integrate [Debezium┬«](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/debezium/#debezium-format) as the engine to capture the data changes. + +Flink supports to interpret Debezium JSON and Avro messages as INSERT/UPDATE/DELETE messages into Flink SQL system. + +This support is useful in many cases to: + +- Synchronize incremental data from databases to other systems +- Audit logs +- Build real-time materialized views on databases +- View temporal join changing history of a database table + +Now, let us learn how to use Change Data Capture (CDC) of SQL Server using Flink SQL. The SQLServer CDC connector allows for reading snapshot data and incremental data from SQLServer database. + +## Prerequisites + * [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) + * [Azure SQL Server](/azure/azure-sql/azure-sql-iaas-vs-paas-what-is-overview) + +### Apache Flink SQLServer CDC Connector + +The SQLServer CDC connector is a Flink Source connector, which reads database snapshot first and then continues to read change events with exactly once processing even failures happen. This example uses FLINK CDC to create a SQLServerCDC table on FLINK SQL + +### Use SSH to use Flink SQL-client + +We have already covered this section in detail on how to use [secure shell](./flink-web-ssh-on-portal-to-flink-sql.md) with Flink. + +## Prepare table and enable cdc feature on SQL Server sqldb + +Let us prepare a table and enable the CDC, You can refer the detailed steps listed on [SQL Documentation](/sql/relational-databases/track-changes/enable-and-disable-change-data-capture-sql-server?) + +Create a database +``` SQL +CREATE DATABASE inventory; +GO +``` + +Enable CDC on the SQL Server database + +``` SQL +USE inventory; +EXEC sys.sp_cdc_enable_db; +GO +``` + +Verify that the user has access to the CDC table +``` SQL +USE inventory +GO +EXEC sys.sp_cdc_help_change_data_capture +GO +``` + +> [!NOTE] +> The query returns configuration information for each table in the database (enabled for CDC). If the result is empty, verify that the user has privileges to access both the capture instance as well as the CDC tables. ++ +Create and populate our products using a single insert with many rows + +``` SQL +CREATE TABLE products ( +id INTEGER IDENTITY(101,1) NOT NULL PRIMARY KEY, +name VARCHAR(255) NOT NULL, +description VARCHAR(512), +weight FLOAT +); + +INSERT INTO products(name,description,weight) +VALUES ('scooter','Small 2-wheel scooter',3.14); +INSERT INTO products(name,description,weight) +VALUES ('car battery','12V car battery',8.1); +INSERT INTO products(name,description,weight) +VALUES ('12-pack drill bits','12-pack of drill bits with sizes ranging from #40 to #3',0.8); +INSERT INTO products(name,description,weight) +VALUES ('hammer','12oz carpenter''s hammer',0.75); +INSERT INTO products(name,description,weight) +VALUES ('hammer','14oz carpenter''s hammer',0.875); +INSERT INTO products(name,description,weight) +VALUES ('hammer','16oz carpenter''s hammer',1.0); +INSERT INTO products(name,description,weight) +VALUES ('rocks','box of assorted rocks',5.3); +INSERT INTO products(name,description,weight) +VALUES ('jacket','water resistent black wind breaker',0.1); +INSERT INTO products(name,description,weight) +VALUES ('spare tire','24 inch spare tire',22.2); + +EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'products', @role_name = NULL, @supports_net_changes = 0; + +-- Creating simple orders on SQL Table + +CREATE TABLE orders ( +id INTEGER IDENTITY(10001,1) NOT NULL PRIMARY KEY, +order_date DATE NOT NULL, +purchaser INTEGER NOT NULL, +quantity INTEGER NOT NULL, +product_id INTEGER NOT NULL, +FOREIGN KEY (product_id) REFERENCES products(id) +); + +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('16-JAN-2016', 1001, 1, 102); +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('17-JAN-2016', 1002, 2, 105); +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('19-FEB-2016', 1002, 2, 106); +INSERT INTO orders(order_date,purchaser,quantity,product_id) +VALUES ('21-FEB-2016', 1003, 1, 107); + +EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'orders', @role_name = NULL, @supports_net_changes = 0; +GO +``` +## Download SQLServer CDC connector and its dependencies on SSH + +*WSL to ubuntu on local to check all dependencies related flink-sql-connector-sqlserver-cdc* jar** + +``` +myvm@MININT-481C9TJ:/mnt/c/Work/99_tools/apache-maven-3.9.0/bin$ vim pom.xml + +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + <groupId>com.dep.download</groupId> + <artifactId>dep-download</artifactId> + <version>1.0-SNAPSHOT</version> +<!-- https://mvnrepository.com/artifact/com.ververica/flink-sql-connector-sqlserver-cdc --> + <dependency> + <groupId>com.ververica</groupId> + <artifactId>flink-sql-connector-sqlserver-cdc</artifactId> + <version>2.3.0</version> + </dependency> +</project> + +myvm@MININT-481C9TJ:/mnt/c/Work/99_tools/apache-maven-3.9.0/bin$ mkdir target + +myvm@MININT-481C9TJ:/mnt/c/Work/99_tools/apache-maven-3.9.0/bin$ /mnt/c/Work/99_tools/apache-maven-3.9.0/bin/mvn -DoutputDirectory=target -f pom.xml dependency:copy-dependencies +[INFO] Scanning for projects... + +myvm@MININT-481C9TJ:/mnt/c/Work/99_tools/apache-maven-3.9.0/bin$ cd target +myvm@MININT-481C9TJ:/mnt/c/Work/99_tools/apache-maven-3.9.0/bin/target$ ll +total 19436 +drwxrwxrwx 1 msdata msdata 4096 Feb 9 08:39 ./ +drwxrwxrwx 1 msdata msdata 4096 Feb 9 08:37 ../ +-rwxrwxrwx 1 msdata msdata 85388 Feb 9 08:39 awaitility-4.0.1.jar* +-rwxrwxrwx 1 msdata msdata 3085931 Feb 9 08:39 flink-shaded-guava-30.1.1-jre-16.0.jar* +-rwxrwxrwx 1 msdata msdata 16556459 Feb 9 08:39 flink-sql-connector-sqlserver-cdc-2.3.0.jar* +-rwxrwxrwx 1 msdata msdata 123103 Feb 9 08:39 hamcrest-2.1.jar* +-rwxrwxrwx 1 msdata msdata 40502 Feb 9 08:39 slf4j-api-1.7.15.jar* +``` +Let us download jars to SSH +```sql +wget https://repo1.maven.org/maven2/com/ververica/flink-connector-sqlserver-cdc/2.4.0/flink-connector-sqlserver-cdc-2.4.0.jar +wget https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-guava/30.1.1-jre-16.0/flink-shaded-guava-30.1.1-jre-16.0.jar +wget https://repo1.maven.org/maven2/org/awaitility/awaitility/4.0.1/awaitility-4.0.1.jar +wget https://repo1.maven.org/maven2/org/hamcrest/hamcrest/2.1/hamcrest-2.1.jar +wget https://repo1.maven.org/maven2/net/java/loci/jsr308-all/1.1.2/jsr308-all-1.1.2.jar + +msdata@pod-0 [ ~/jar ]$ ls -l +total 6988 +-rw-r-- 1 msdata msdata 85388 Sep 6 2019 awaitility-4.0.1.jar +-rw-r-- 1 msdata msdata 107097 Jun 25 03:47 flink-connector-sqlserver-cdc-2.4.0.jar +-rw-r-- 1 msdata msdata 3085931 Sep 27 2022 flink-shaded-guava-30.1.1-jre-16.0.jar +-rw-r-- 1 msdata msdata 123103 Dec 20 2018 hamcrest-2.1.jar +-rw-r-- 1 msdata msdata 3742993 Mar 30 2011 jsr308-all-1.1.2.jar +``` + +### Add jar into sql-client.sh and connect to Flink SQL Client + +```sql +msdata@pod-0 [ ~ ]$ bin/sql-client.sh -j jar/flink-sql-connector-sqlserver-cdc-2.4.0.jar -j jar/flink-shaded-guava-30.1.1-jre-16.0.jar -j jar/hamcrest-2.1.jar -j jar/awaitility-4.0.1.jar -j jar/jsr308-all-1.1.2.jar +``` +## Create SQLServer CDC table + +``` sql +SET 'sql-client.execution.result-mode' = 'tableau'; + +CREATE TABLE orders ( + id INT, + order_date DATE, + purchaser INT, + quantity INT, + product_id INT, + PRIMARY KEY (id) NOT ENFORCED +) WITH ( + 'connector' = 'sqlserver-cdc', + 'hostname' = '<updatehostname>.database.windows.net', //update with the host name + 'port' = '1433', + 'username' = '<update-username>', //update with the user name + 'password' = '<update-password>', //update with the password + 'database-name' = 'inventory', + 'table-name' = 'dbo.orders' +); + +select * from orders; +``` ++ +### Perform changes on table from SQLServer side ++ +## Validation + +Monitor the table on Flink SQL +++ +### Reference +* [SQLServer CDC Connector](https://ververica.github.io/flink-cdc-connectors/master/content/connectors/sqlserver-cdc.html) is licensed under [Apache 2.0 License](https://github.com/ververica/flink-cdc-connectors/blob/master/LICENSE)
hdinsight-aks	Use Apache Nifi With Datastream Api	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/use-apache-nifi-with-datastream-api.md	+ + Title: Use Apache NiFi with HDInsight on AKS Apache Flink to publish into ADLS Gen2 +description: Learn how to use Apache NiFi to consume Processed Kafka topic from HDInsight Apache Flink on AKS and publish into ADLS Gen2 ++ Last updated : 08/29/2023++ +# Use Apache NiFi to consume processed Kafka topics from Apache Flink and publish into ADLS Gen2 ++ +Apache NiFi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. + +For more information, see [Apache NiFi](https://nifi.apache.org) + +In this document, we process streaming data using HDInsight Kafka and perform some transformations on HDInsight Apache Flink on AKS, consume these topics and write the contents into ADLS Gen2 on Apache NiFi. + +By combining the low latency streaming features of Apache Flink and the dataflow capabilities of Apache NiFi, you can process events at high volume. This combination helps you to trigger, enrich, filter, to enhance overall user experience. Both these technologies complement each other with their strengths in event streaming and correlation. + +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* [HDInsight Kafka](../../hdinsight/kafk) + * You're required to ensure the network settings are taken care as described on [Using HDInsight Kafka](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS Flink and HDInsight Kafka are in the same VNet +* For this demonstration, we're using a Window VM as maven project develop env in the same VNET as HDInsight on AKS +* For this demonstration, we're using an Ubuntu VM in the same VNET as HDInsight on AKS, install Apache NiFi 1.22.0 on this VM + +## Prepare HDInsight Kafka topic + +For purposes of this demonstration, we're using a HDInsight Kafka Cluster, let us prepare HDInsight Kafka topic for the demo. + +> [!NOTE] +> Setup a HDInsight [Kafka](../../hdinsight/kafk) Cluster and Replace broker list with your own list before you get started for both Kafka 2.4 and 3.2. + +HDInsight Kafka 2.4.1 +``` +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic click_events --zookeeper zk0-contsk:2181 +``` + +HDInsight Kafka 3.2.0 +``` +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic click_events --bootstrap-server wn0-contsk:9092 +``` +## Setup Apache NiFi 1.22.0 + +For this demo, we install Apache NiFi 1.22.0 on an Ubuntu VM in the same VNet as HDInsight Flink on AKS, or you can also use your NiFi setup. + +[Apache NiFi Downloads](https://nifi.apache.org/download.html) + +``` +root@contosoubuntuvm:/home/myvm/nifi-1.22.0/bin# ./nifi.sh start + +Java home: /home/myvm/jdk-18.0.1.1 +NiFi home: /home/myvm/nifi-1.22.0 + +Bootstrap Config File: /home/myvm/nifi-1.22.0/conf/bootstrap.conf ++ +root@contosoubuntuvm:/home/myvm/nifi-1.22.0/bin# jps +454421 NiFi +454467 Jps +454396 RunNiFi +``` + +Configuring NiFi UI + +Here, we configure NiFi properties in order to be accessed outside the localhost VM. + +`$nifi_home/conf/nifi.properties` ++ +## Process streaming data from HDInsight Kafka On HDInsight on AKS Flink + +Let us develop the source code on Maven, to build the jar. + +SinkToKafka.java + +``` java +package contoso.example; + +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.api.common.typeinfo.Types; +import org.apache.flink.connector.base.DeliveryGuarantee; +import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema; +import org.apache.flink.connector.kafka.sink.KafkaSink; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSource; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; ++ +public class SinkToKafka { + public static void main(String[] args) throws Exception { + // 1. get stream env, update the broker-ips with your own + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + String brokers = "<update-brokerip>:9092,<update-brokerip>:9092,<update-brokerip>:9092"; // Replace the broker list with your own + + // 2. event data source + DataStreamSource<Event> stream = env.addSource(new ClickSource()); + + DataStream<String> dataStream = stream.map(line-> { + String str1 = line.toString(); + return str1; + }).returns(Types.STRING); + + // 3. sink click events to kafka + KafkaSink<String> sink = KafkaSink.<String>builder() + .setBootstrapServers(brokers) + .setRecordSerializer(KafkaRecordSerializationSchema.builder() + .setTopic("click_events") + .setValueSerializationSchema(new SimpleStringSchema()) + .build() + ) + .setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) + .build(); + + dataStream.sinkTo(sink); + env.execute("Sink click events to Kafka"); + } +} +``` + +Event.java +``` java +import java.sql.Timestamp; + +public class Event { + + public String user; + public String url; + public String ts; + public Event() { + } + + public Event(String user, String url, String ts) { + this.user = user; + this.url = url; + this.ts = ts; + } + + @Override + public String toString(){ + return "\"" + ts + "\"" + "," + "\"" + user + "\"" + "," + "\"" + url + "\""; + } +} +``` + +ClickSource.java +``` java +import org.apache.flink.streaming.api.functions.source.SourceFunction; +import java.util.Calendar; +import java.util.Random; + +public class ClickSource implements SourceFunction<Event> { + // declare a flag + private Boolean running = true; + + // declare a flag + public void run(SourceContext<Event> ctx) throws Exception{ + // generate random record + Random random = new Random(); + String[] users = {"Mary","Alice","Bob","Cary"}; + String[] urls = {"./home","./cart","./fav","./prod?id=100","./prod?id=10"}; + + // loop generate + while (running) { + String user = users[random.nextInt(users.length)]; + String url = urls[random.nextInt(urls.length)]; + Long timestamp = Calendar.getInstance().getTimeInMillis(); + String ts = timestamp.toString(); + ctx.collect(new Event(user,url,ts)); +// Thread.sleep(2000); + } + } + @Override + public void cancel() + { + running = false; + } +} +``` +Maven pom.xml + +You can replace 2.4.1 with 3.2.0 in case you're using HDInsight Kafka 3.2.0, where applicable on the pom.xml + +``` xml +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <groupId>org.example</groupId> + <artifactId>FlinkDemoKafka</artifactId> + <version>1.0-SNAPSHOT</version> + <properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <scala.binary.version>2.12</scala.binary.version> + <kafka.version>2.4.1</kafka.version> > Replace 2.4.1 with 3.2.0 , in case you're using HDInsight Kafka 3.2.0 + </properties> + <dependencies> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>${flink.version}</version> + </dependency> + </dependencies> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-assembly-plugin</artifactId> + <version>3.0.0</version> + <configuration> + <appendAssemblyId>false</appendAssemblyId> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> +``` + +## Submit streaming job to HDInsight on AKS - Flink + +Now, lets submit streaming job as mentioned in the previous step into HDInsight on AKS - Flink ++ +## Check the topic on HDInsight Kafka + +Check the topic on HDInsight Kafka. + +``` +root@hn0-contos:/home/sshuser# /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --topic click_events --bootstrap-server wn0-contos:9092 +"1685939238525","Cary","./home" +"1685939240527","Bob","./fav" +"1685939242528","Cary","./prod?id=10" +"1685939244528","Mary","./prod?id=100" +"1685939246529","Alice","./fav" +"1685939248530","Mary","./cart" +"1685939250530","Mary","./prod?id=100" +"1685939252530","Alice","./prod?id=100" +"1685939254530","Alice","./prod?id=10" +"1685939256530","Cary","./prod?id=100" +"1685939258531","Mary","./prod?id=10" +"1685939260531","Cary","./home" +"1685939262531","Mary","./prod?id=10" +"1685939264531","Cary","./prod?id=100" +"1685939266532","Mary","./cart" +"1685939268532","Bob","./fav" +"1685939270532","Mary","./home" +"1685939272533","Cary","./fav" +"1685939274533","Alice","./cart" +"1685939276533","Bob","./prod?id=10" +"1685939278533","Mary","./cart" +"1685939280533","Alice","./fav" +``` + +## Create flow on NiFi UI + +> [!NOTE] +> In this example, we use Azure User Managed Identity to credentials for ADLS Gen2. + +In this demonstration, we have used Apache NiFi instance installed on an Ubuntu VM. We're accessing the NiFi web interface from a Windows VM. The Ubuntu VM needs to have a managed identity assigned to it and network security group (NSG) rules configured. + +To use Managed Identity authentication with the PutAzureDataLakeStorage processor in NiFi. You're required to ensure Ubuntu VM on which NiFi is installed has a managed identity assigned to it, or assign a managed identity to the Ubuntu VM. ++ +Once you have assigned a managed identity to the Azure VM, you need to make sure that the VM can connect to the IMDS (Instance Metadata Service) endpoint. The IMDS endpoint is available at the IP address shown in this example. You need to update your network security group rules to allow outbound traffic from the Ubuntu VM to this IP address. ++ +Run the flow: ++ +[Using Processor ConsumerKafka_2_0's properties setting:](https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-2-0-nar/1.22.0/org.apache.nifi.processors.kafka.pubsub.ConsumeKafka_2_0/https://docsupdatetracker.net/index.html) +++ +[Using Processor PutAzureDataLakeStorage properties setting:](https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.22.0/org.apache.nifi.processors.azure.storage.PutAzureDataLakeStorage/https://docsupdatetracker.net/index.html) ++ +[Using PutAzureDataLakeStorage credential setting:](https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.22.0/org.apache.nifi.services.azure.storage.ADLSCredentialsControllerService/https://docsupdatetracker.net/index.html) ++ +### Lets check output in ADLS Gen2 ++ +## Reference + +* [Apache NiFi](https://nifi.apache.org) +* [Apache NiFi Downloads](https://nifi.apache.org/download.html) +* [Consume Kafka](https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-2-0-nar/1.11.4/org.apache.nifi.processors.kafka.pubsub.ConsumeKafka_2_0/https://docsupdatetracker.net/index.html) +* [Azure Data Lake Storage](https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.12.0/org.apache.nifi.processors.azure.storage.PutAzureDataLakeStorage/https://docsupdatetracker.net/index.html) +* [ADLS Credentials Controller Service](https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-azure-nar/1.12.0/org.apache.nifi.services.azure.storage.ADLSCredentialsControllerService/https://docsupdatetracker.net/index.html) +* [Download IntelliJ IDEA for development](https://www.jetbrains.com/idea/download/#section=windows)
hdinsight-aks	Use Azure Pipelines To Run Flink Jobs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/use-azure-pipelines-to-run-flink-jobs.md	+ + Title: How to use Azure Pipelines with HDInsight on AKS - Flink +description: Learn how to use Azure Pipelines with HDInsight on AKS - Flink ++ Last updated : 09/25/2023++ +# How to use Azure Pipelines with HDInsight on AKS - Flink ++ +In this article, you'll learn how to use Azure Pipelines with HDInsight on AKS to submit Flink jobs via the cluster's REST API. We guide you through the process using a sample YAML pipeline and a PowerShell script, both of which streamline the automation of the REST API interactions. ++ +## Prerequisites + +- Azure subscription. If you do not have an Azure subscription, create a free account. + +- A GitHub account where you can create a repository. [Create one for free](https://azure.microsoft.com/free). + +- Create `.pipeline` directory, copy [flink-azure-pipelines.yml](https://hdiconfigactions.blob.core.windows.net/hiloflinkblob/flink-azure-pipelines.yml) and [flink-job-azure-pipeline.ps1](https://hdiconfigactions.blob.core.windows.net/hiloflinkblob/flink-job-azure-pipeline.ps1) + +- Azure DevOps organization. Create one for free. If your team already has one, then make sure you are an administrator of the Azure DevOps project that you want to use. + +- Ability to run pipelines on Microsoft-hosted agents. To use Microsoft-hosted agents, your Azure DevOps organization must have access to Microsoft-hosted parallel jobs. You can either purchase a parallel job or you can request a free grant. + +- A Flink Cluster. If you donΓÇÖt have one, [Create a Flink Cluster in HDInsight on AKS](flink-create-cluster-portal.md). + +- Create one directory in cluster storage account to copy job jar. This directory later you need to configure in pipeline YAML for job jar location (<JOB_JAR_STORAGE_PATH>). + +## Steps to set up pipeline + +### Create a service principal for Azure Pipelines + + CreateΓÇ»[Azure AD Service Principal](/cli/azure/ad/sp/)ΓÇ»to access Azure ΓÇô Grant permission to access HDInsight on AKS Cluster withΓÇ»ContributorΓÇ»role, make a note of appId, password, and tenant from the response. + ``` + az ad sp create-for-rbac -n <service_principal_name> --role Contributor --scopes <Flink Cluster Resource ID>` + ``` + + Example: + + ``` + az ad sp create-for-rbac -n azure-flink-pipeline --role Contributor --scopes /subscriptions/abdc-1234-abcd-1234-abcd-1234/resourceGroups/myResourceGroupName/providers/Microsoft.HDInsight/clusterpools/hiloclusterpool/clusters/flinkcluster` + ``` + +### Create a key vault + + 1. Create Azure Key Vault, you can follow [this tutorial](/azure/key-vault/general/quick-create-portal) to create a new Azure Key Vault. + + 1. Create three Secrets + + - cluster-storage-key for storage key. + + - service-principal-key for principal clientId or appId. + + - service-principal-secret for principal secret. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/create-key-vault.png" alt-text="Screenshot showing how to create key vault." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/create-key-vault.png"::: + + 1. Grant permission to access Azure Key Vault with theΓÇ»ΓÇ£Key Vault Secrets OfficerΓÇ¥ΓÇ»role to service principal. ++ +### Setup pipeline + + 1. Navigate to your Project and click Project Settings. + + 1. Scroll down and select Service Connections, and then New Service Connection. + + 1. Select Azure Resource Manager. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/select-new-service-connection.png" alt-text="Screenshot showing how to select a new service connection." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/select-new-service-connection.png"::: + + 1. In the authentication method, select Service Principal (manual). + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/new-service-connection.png" alt-text="Screenshot shows new service connection." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/new-service-connection.png"::: + + 1. Edit the service connection properties. Select the service principal you recently created. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/edit-service-connection.png" alt-text="Screenshot showing how to edit service connection." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/edit-service-connection.png"::: + + 1. Click Verify to check whether the connection was set up correctly. If you encounter the following error: + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/service-connection-error-message.png" alt-text="Screenshot showing service connection error message." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/service-connection-error-message.png"::: + + 1. Then you need to assign the Reader role to the subscription. + + 1. After that, the verification should be successful. + + 1. Save the service connection. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/tenant-id.png" alt-text="Screenshot showing how to view the Tenant-ID." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/tenant-id.png"::: + + 1. Navigate to pipelines and click on New Pipeline. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/create-new-pipeline.png" alt-text="Screenshot showing how to create a new pipeline." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/create-new-pipeline.png"::: + + 1. Select GitHub as the location of your code. + + 1. Select the repository. See [how to create a repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-new-repository) in GitHub. select-github-repo image. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/search-your-code.png" alt-text="Screenshot showing how to search your code." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/search-your-code.png"::: + + + 1. Select the repository. For more information, [see How to create a repository in GitHub](https://docs.github.com/repositories/creating-and-managing-repositories/creating-a-new-repository). + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/select-github-repo.png" alt-text="Screenshot showing how to select a GitHub repository." lightbox="./media/use-azure-pipelines-to-run-flink-jobs/select-github-repo.png"::: + + 1. From configure your pipeline option, you can choose Existing Azure Pipelines YAML file. Select branch and pipeline script that you copied earlier. (.pipeline/flink-azure-pipelines.yml) + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/configure-pipeline.png" alt-text="Screenshot showing how to configure pipeline."::: + + 1. Replace value in variable section. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/replace-value.png" alt-text="Screenshot showing how to replace value."::: + + 1. Correct code build section based on your requirement and configure <JOB_JAR_LOCAL_PATH> in variable section for job jar local path. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/code-build-section.png" alt-text="Screenshot shows code build section."::: + + 1. Add pipeline variable "action" and configure value "RUN." + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/pipeline-variable.png" alt-text="Screenshot shows how to add pipeline variable."::: + + you can change the values of variable before running pipeline. + + - NEW: This value is default. It launches new job and if the job is already running then it updates the running job with latest jar. + + - SAVEPOINT: This value takes the save point for running job. + + - DELETE: Cancel or delete the running job. + + 1. Save and run the pipeline. You can see the running job on portal in Flink Job section. + + :::image type="content" source="./media/use-azure-pipelines-to-run-flink-jobs/save-run-pipeline.png" alt-text="Screenshot shows how to save and run pipeline."::: ++ +> [!NOTE] +> This is one sample to submit the job using pipeline. You can follow the Flink REST API doc to write your own code to submit job.
hdinsight-aks	Use Flink Cli To Submit Jobs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/use-flink-cli-to-submit-jobs.md	+ + Title: How to use Apache Flink CLI to submit jobs +description: Learn how to use Apache Flink CLI to submit jobs ++ Last updated : 08/29/2023++ +# Apache Flink Command-Line Interface (CLI) ++ +Apache Flink provides a CLI (Command Line Interface) bin/flink to run jobs (programs) that are packaged as JAR files and to control their execution. The CLI is part of the Flink setup and can be set up on a single-node VM. It connects to the running JobManager specified in conf/flink-conf.yaml. + +## Installation Steps + +To install Flink CLI on Linux, you need a Linux VM to execute the installation script. You need to run a bash environment if you are on Windows. + +> [!NOTE] +> This does NOT work on Windows GIT BASH, you need to installΓÇ»[WSL](/windows/wsl/install)ΓÇ»to make this work on Windows. + +### Requirements +* Install JRE 11. If not installed, follow the steps described in `/java/openjdk/download`. +* Add java to PATH or define JAVA_HOME environment variable pointing to JRE installation directory, such that `$JAVA_HOME/bin/java` exists. + +### Install or update + +Both installing and updating the CLI require rerunning the install script. Install the CLI by running curl. + +```bash +curl -L https://aka.ms/hdionaksflinkcliinstalllinux \| bash +``` + +This command installs Flink CLI in the user's home directory (`$HOME/flink-cli`). The script can also be downloaded and run locally. You may have to restart your shell in order for changes to take effect. + +## Run an Apache Flink command to test + + ```bash + cd $HOME/flink-cli + + bin/flink list -D azure.tenant.id=<update-tenant-id> -D rest.address=<flink-cluster-fqdn> + ``` + > [!NOTE] + > If executing via SSH pod, use the command ```bin/flink list``` to give you the complete output. + + If you don't want to add those parameters every time, add them to conf/flink-conf.yaml. + + ```bash + rest.address: <flink-cluster-fqdn> + azure.tenant.id: <tenant-id> + ``` + Now the command becomes + + ```bash + bin/flink list + ``` + + You should see output like the following: + + ```output + To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code E4LW35GFD to authenticate. + ``` + + Open [https://microsoft.com/devicelogin](https://microsoft.com/devicelogin) in your browser, and enter the code, then use your microsoft.com ID to log in. After successful login, you should see output like the following if no job is running. + + ```output + Waiting for response... + No running jobs. + No scheduled jobs. + ``` + +#### curl `Object Moved` error + +If you get an error from curl related to the -L parameter, or an error message including the text "Object Moved", try using the full URL instead of the aka.ms redirect: + +```bash +curl https://hdiconfigactions.blob.core.windows.net/hiloflinkblob/install.sh \| bash +``` + +## Examples +Here are some examples of actions supported by FlinkΓÇÖs CLI tool: + +\| Action \| Purpose \| +\|-\|-\| +\| run \| This action executes jobs. It requires at least the jar containing the job. Flink- or job-related arguments can be passed if necessary. \| +\| info \| This action can be used to print an optimized execution graph of the passed job. Again, the jar containing the job needs to be passed. \| +\| list \| This action lists all running or scheduled jobs.\| +\| savepoint \| This action can be used to create or disposing savepoints for a given job. It may be necessary to specify a savepoint directory besides the JobID. \| +\| cancel \| This action can be used to cancel running jobs based on their JobID. \| +\| stop \| This action combines the cancel and savepoint actions to stop a running job but also creates a savepoint to start from again. \| + +All actions and their parameters can be accessed through the following commands: + +```bash +bin/flink --help +``` + +The usage information of each individual action + +```bash +bin/flink <action> --help +``` + +> [!TIP] +> * If you have a Proxy blocking the connection: In order to get the installation scripts, your proxy needs to allow HTTPS connections to the following addresses: `https://aka.ms/` and `https://hdiconfigactions.blob.core.windows.net` +> * To resolve the issue, add the user or group to the [authorization profile](../hdinsight-on-aks-manage-authorization-profile.md).
hdinsight-aks	Use Flink Delta Connector	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/use-flink-delta-connector.md	+ + Title: How to use Apache Flink & Delta connector in HDInsight on AKS +description: Learn how to use Apache Flink-Delta connector ++ Last updated : 08/29/2023++ +# How to use Apache Flink-Delta connector ++ +By using Apache Flink and Delta Lake together, you can create a reliable and scalable data lakehouse architecture. The Flink/Delta Connector allows you to write data to Delta tables with ACID transactions and exactly once processing. It means that your data streams are consistent and error-free, even if you restart your Flink pipeline from a checkpoint. The Flink/Delta Connector ensures that your data isn't lost or duplicated, and that it matches the Flink semantics. + +In this article, you learn how to use Flink-Delta connector + +> [!div class="checklist"] +> * Read the data from the delta table. +> * Write the data to a delta table. +> * Query it in Power BI. + +## What is Apache Flink-Delta connector + +Flink-Delta Connector is a JVM library to read and write data from Apache Flink applications to Delta tables utilizing the Delta Standalone JVM library. The connector provides exactly once delivery guarantee. + +## Apache Flink-Delta Connector includes + +* DeltaSink for writing data from Apache Flink to a Delta table. +* DeltaSource for reading Delta tables using Apache Flink. + +We are using the following connector, to match with the HDInsight on AKS Flink version. + +\|Connector's version\| Flink's version\| +\|-\|-\| +\|0.6.0 \|X >= 1.15.3\| + +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](./flink-create-cluster-portal.md) +* storage account +* [Power BI desktop](https://www.microsoft.com/download/details.aspx?id=58494) + +## Read data from delta table + +There are two types of delta sources, when it comes to reading data from delta table. + +* Bounded: Batch processing +* Continuous: Streaming processing + +In this example, we're using a bounded state of delta source. + +Sample xml file + +```xml +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <groupId>org.example.flink.delta</groupId> + <artifactId>flink-delta</artifactId> + <version>1.0-SNAPSHOT</version> + <packaging>jar</packaging> + + <name>Flink Quickstart Job</name> + + <properties> + <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> + <flink.version>1.16.0</flink.version> + <target.java.version>1.8</target.java.version> + <scala.binary.version>2.12</scala.binary.version> + <maven.compiler.source>${target.java.version}</maven.compiler.source> + <maven.compiler.target>${target.java.version}</maven.compiler.target> + <log4j.version>2.17.1</log4j.version> + </properties> + + <repositories> + <repository> + <id>apache.snapshots</id> + <name>Apache Development Snapshot Repository</name> + <url>https://repository.apache.org/content/repositories/snapshots/</url> + <releases> + <enabled>false</enabled> + </releases> + <snapshots> + <enabled>true</enabled> + </snapshots> + </repository> +<!-- <repository>--> +<!-- <id>delta-standalone_2.12</id>--> +<!-- <url>file://C:\Users\varastogi\Workspace\flink-main\flink-k8s-operator\target</url>--> +<!-- </repository>--> + </repositories> + + <dependencies> + <!-- Apache Flink dependencies --> + <!-- These dependencies are provided, because they should not be packaged into the JAR file. --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + <scope>provided</scope> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + <scope>provided</scope> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + <scope>provided</scope> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-base</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-files</artifactId> + <version>${flink.version}</version> + </dependency> +<!-- <dependency>--> +<!-- <groupId>io.delta</groupId>--> +<!-- <artifactId>delta-standalone_2.12</artifactId>--> +<!-- <version>4.0.0</version>--> +<!-- <scope>system</scope>--> +<!-- <systemPath>C:\Users\varastogi\Workspace\flink-main\flink-k8s-operator\target\io\delta\delta-standalone_2.12\4.0.0\delta-standalone_2.12-4.0.0.jar</systemPath>--> +<!-- </dependency>--> + <dependency> + <groupId>io.delta</groupId> + <artifactId>delta-standalone_2.12</artifactId> + <version>0.6.0</version> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-mapreduce-client-core</artifactId> + <version>3.2.1</version> + </dependency> + <dependency> + <groupId>io.delta</groupId> + <artifactId>delta-flink</artifactId> + <version>0.6.0</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-parquet</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.parquet</groupId> + <artifactId>parquet-common</artifactId> + <version>1.12.2</version> + </dependency> + <dependency> + <groupId>org.apache.parquet</groupId> + <artifactId>parquet-column</artifactId> + <version>1.12.2</version> + </dependency> + <dependency> + <groupId>org.apache.parquet</groupId> + <artifactId>parquet-hadoop</artifactId> + <version>1.12.2</version> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-azure</artifactId> + <version>3.3.2</version> + </dependency> +<!-- <dependency>--> +<!-- <groupId>org.apache.hadoop</groupId>--> +<!-- <artifactId>hadoop-azure</artifactId>--> +<!-- <version>3.3.4</version>--> +<!-- </dependency>--> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-mapreduce-client-core</artifactId> + <version>3.2.1</version> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-client</artifactId> + <version>3.3.2</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-common</artifactId> + <version>${flink.version}</version> +<!-- <scope>provided</scope>--> + </dependency> + <dependency> + <groupId>org.apache.parquet</groupId> + <artifactId>parquet-hadoop-bundle</artifactId> + <version>1.10.0</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-runtime</artifactId> + <version>${flink.version}</version> + <scope>provided</scope> + </dependency> +<!-- <dependency>--> +<!-- <groupId>org.apache.flink</groupId>--> +<!-- <artifactId>flink-table-common</artifactId>--> +<!-- <version>${flink.version}</version>--> +<!-- </dependency>--> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-common</artifactId> + <version>3.3.2</version> + </dependency> + + <!-- Add connector dependencies here. They must be in the default scope (compile). --> + + <!-- Example: + + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> + </dependency> + --> + + <!-- Add logging framework, to produce console output when running in the IDE. --> + <!-- These dependencies are excluded from the application JAR by default. --> + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-slf4j-impl</artifactId> + <version>${log4j.version}</version> + <scope>runtime</scope> + </dependency> + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-api</artifactId> + <version>${log4j.version}</version> + <scope>runtime</scope> + </dependency> + <dependency> + <groupId>org.apache.logging.log4j</groupId> + <artifactId>log4j-core</artifactId> + <version>${log4j.version}</version> + <scope>runtime</scope> + </dependency> + </dependencies> + + <build> + <plugins> + + <!-- Java Compiler --> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-compiler-plugin</artifactId> + <version>3.1</version> + <configuration> + <source>${target.java.version}</source> + <target>${target.java.version}</target> + </configuration> + </plugin> + + <!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. --> + <!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. --> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <version>3.1.1</version> + <executions> + <!-- Run shade goal on package phase --> + <execution> + <phase>package</phase> + <goals> + <goal>shade</goal> + </goals> + <configuration> + <createDependencyReducedPom>false</createDependencyReducedPom> + <artifactSet> + <excludes> + <exclude>org.apache.flink:flink-shaded-force-shading</exclude> + <exclude>com.google.code.findbugs:jsr305</exclude> + <exclude>org.slf4j:</exclude> + <exclude>org.apache.logging.log4j:</exclude> + </excludes> + </artifactSet> + <filters> + <filter> + <!-- Do not copy the signatures in the META-INF folder. + Otherwise, this might cause SecurityExceptions when using the JAR. --> + <artifact>:</artifact> + <excludes> + <exclude>META-INF/.SF</exclude> + <exclude>META-INF/.DSA</exclude> + <exclude>META-INF/.RSA</exclude> + </excludes> + </filter> + </filters> + <transformers> + <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> + <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> + <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> + <mainClass>org.example.flink.delta.DataStreamJob</mainClass> + </transformer> + </transformers> + </configuration> + </execution> + </executions> + </plugin> + </plugins> + + <pluginManagement> + <plugins> + + <!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. --> + <plugin> + <groupId>org.eclipse.m2e</groupId> + <artifactId>lifecycle-mapping</artifactId> + <version>1.0.0</version> + <configuration> + <lifecycleMappingMetadata> + <pluginExecutions> + <pluginExecution> + <pluginExecutionFilter> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <versionRange>[3.1.1,)</versionRange> + <goals> + <goal>shade</goal> + </goals> + </pluginExecutionFilter> + <action> + <ignore/> + </action> + </pluginExecution> + <pluginExecution> + <pluginExecutionFilter> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-compiler-plugin</artifactId> + <versionRange>[3.1,)</versionRange> + <goals> + <goal>testCompile</goal> + <goal>compile</goal> + </goals> + </pluginExecutionFilter> + <action> + <ignore/> + </action> + </pluginExecution> + </pluginExecutions> + </lifecycleMappingMetadata> + </configuration> + </plugin> + </plugins> + </pluginManagement> + </build> +</project> +``` + You're required to build the jar with required libraries and dependencies. +* Specify the ADLS Gen2 location in our java class to reference the source data. + + + ```java + public StreamExecutionEnvironment createPipeline( + String tablePath, + int sourceParallelism, + int sinkParallelism) { + + DeltaSource<RowData> deltaSink = getDeltaSource(tablePath); + StreamExecutionEnvironment env = getStreamExecutionEnvironment(); + + env + .fromSource(deltaSink, WatermarkStrategy.noWatermarks(), "bounded-delta-source") + .setParallelism(sourceParallelism) + .addSink(new ConsoleSink(Utils.FULL_SCHEMA_ROW_TYPE)) + .setParallelism(1); + + return env; + } + + /** + * An example of Flink Delta Source configuration that will read all columns from Delta table + * using the latest snapshot. + / + @Override + public DeltaSource<RowData> getDeltaSource(String tablePath) { + return DeltaSource.forBoundedRowData( + new Path(tablePath), + new Configuration() + ).build(); + } + ``` + +1. Call the read class while submitting the job using [Flink CLI](./flink-web-ssh-on-portal-to-flink-sql.md). + + :::image type="content" source="./media/use-flink-delta-connector/call-the-read-class.png" alt-text="Screenshot shows how to call the read class file." lightbox="./media/use-flink-delta-connector/call-the-read-class.png"::: + +1. After submitting the job, + 1. Check the status and metrics on Flink UI. + 1. Check the job manager logs for more details. + + :::image type="content" source="./media/use-flink-delta-connector/check-job-manager-logs.png" alt-text="Screenshot shows job manager logs." lightbox="./media/use-flink-delta-connector/check-job-manager-logs.png"::: + +## Writing to Delta sink + +The delta sink is used for writing the data to a delta table in ADLS gen2. The data stream consumed by the delta sink. +1. Build the jar with required libraries and dependencies. +1. Enable checkpoint for delta logs to commit the history. + + :::image type="content" source="./media/use-flink-delta-connector/enable-checkpoint-for-delta-logs.png" alt-text="Screenshot shows how enable checkpoint for delta logs." lightbox="./media/use-flink-delta-connector/enable-checkpoint-for-delta-logs.png"::: + + ```java + public StreamExecutionEnvironment createPipeline( + String tablePath, + int sourceParallelism, + int sinkParallelism) { + + DeltaSink<RowData> deltaSink = getDeltaSink(tablePath); + StreamExecutionEnvironment env = getStreamExecutionEnvironment(); + + // Using Flink Delta Sink in processing pipeline + env + .addSource(new DeltaExampleSourceFunction()) + .setParallelism(sourceParallelism) + .sinkTo(deltaSink) + .name("MyDeltaSink") + .setParallelism(sinkParallelism); + + return env; + } + + /* + * An example of Flink Delta Sink configuration. + / + @Override + public DeltaSink<RowData> getDeltaSink(String tablePath) { + return DeltaSink + .forRowData( + new Path(TABLE_PATH), + new Configuration(), + Utils.FULL_SCHEMA_ROW_TYPE) + .build(); + } + ``` +1. Call the delta sink class while submitting the job via Flink CLI. +1. Specify the account key of the storage account in `flink-client-config` using [Flink configuration management](./flink-configuration-management.md). You can specify the account key of the storage account in Flink config. `fs.azure.<storagename>.dfs.core.windows.net : <KEY >` + + :::image type="content" source="./media/use-flink-delta-connector/call-the-delta-sink-class.png" alt-text="Screenshot shows how to call the delta sink class." lightbox="./media/use-flink-delta-connector/call-the-delta-sink-class.png"::: + +1. Specify the path of ADLS Gen2 storage account while specifying the delta sink properties. +1. Once the job is submitted, check the status and metrics on Flink UI. + + :::image type="content" source="./media/use-flink-delta-connector/check-the-status-on-flink-ui.png" alt-text="Screenshot shows status on Flink UI." lightbox="./media/use-flink-delta-connector/check-the-status-on-flink-ui.png"::: + + :::image type="content" source="./media/use-flink-delta-connector/view-the-checkpoints-on-flink-ui.png" alt-text="Screenshot shows the checkpoints on Flink-UI." lightbox="./media/use-flink-delta-connector/view-the-checkpoints-on-flink-ui.png"::: + + :::image type="content" source="./media/use-flink-delta-connector/view-the-metrics-on-flink-ui.png" alt-text="Screenshot shows the metrics on Flink UI." lightbox="./media/use-flink-delta-connector/view-the-metrics-on-flink-ui.png"::: + +## Power BI integration + +Once the data is in delta sink, you can run the query in Power BI desktop and create a report. +1. Open your Power BI desktop and get the data using ADLS Gen2 connector. + + :::image type="content" source="./media/use-flink-delta-connector/view-power-bi-desktop.png" alt-text="Screenshot shows Power BI desktop."::: + + :::image type="content" source="./media/use-flink-delta-connector/view-adls-gen2-connector.png" alt-text="Screenshot shows ADLSGen 2 connector."::: + +1. URL of the storage account. + + :::image type="content" source="./media/use-flink-delta-connector/url-of-the-storage-account.png" alt-text="Screenshot showing the URL of the storage account."::: + + :::image type="content" source="./media/use-flink-delta-connector/adls-gen-2-details.png" alt-text="Screenshot shows ADLS Gen2-details."::: + +1. Create M-query for the source and invoke the function, which queries the data from storage account. Refer [Delta Power BI connectors](https://github.com/delta-io/connectors/tree/master/powerbi). + +1. Once the data is readily available, you can create reports. + + :::image type="content" source="./media/use-flink-delta-connector/create-reports.png" alt-text="Screenshot shows how to create reports."::: + +## References + + [Delta connectors](https://github.com/delta-io/connectors/tree/master/flink). +* [Delta Power BI connectors](https://github.com/delta-io/connectors/tree/master/powerbi).
hdinsight-aks	Use Flink To Sink Kafka Message Into Hbase	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/use-flink-to-sink-kafka-message-into-hbase.md	+ + Title: Write messages to HBase with DataStream API +description: Learn how to write messages to HBase with DataStream API ++ Last updated : 08/29/2023++ +# Write messages to HBase with DataStream API ++ +In this article, learn how to write messages to HBase with Apache Flink DataStream API + +## Overview + +Apache Flink offers HBase connector as a sink, with this connector with Flink you can store the output of a real-time processing application in HBase. Learn how to process streaming data on HDInsight Kafka as a source, perform transformations, then sink into HDInsight HBase table. + +In a real world scenario, this example is a stream analytics layer to realize value from Internet of Things (IOT) analytics, which use live sensor data. The Flink Stream can read data from Kafka topic and write it to HBase table. If there is a real time streaming IOT application, the information can be gathered, transformed and optimized. ++ +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0](../flink/flink-create-cluster-portal.md) +* [HDInsight Kafka](../flink/process-and-consume-data.md) +* [HDInsight HBase 2.4.11](../../hdinsight/hbase/apache-hbase-tutorial-get-started-linux.md#create-apache-hbase-cluster) + * You're required to make sure HDInsight on AKS Flink can connect to HDInsight HBase Master(zk), with same virtual network. +* Maven project on IntelliJ IDEA for development on an Azure VM in the same VNet + +## Implementation Steps + +### Use pipeline to produce Kafka topic (user click event topic) + +weblog.py + +``` python +import json +import random +import time +from datetime import datetime + +user_set = [ + 'John', + 'XiaoMing', + 'Mike', + 'Tom', + 'Machael', + 'Zheng Hu', + 'Zark', + 'Tim', + 'Andrew', + 'Pick', + 'Sean', + 'Luke', + 'Chunck' +] + +web_set = [ + 'https://github.com', + 'https://www.bing.com/new', + 'https://kafka.apache.org', + 'https://hbase.apache.org', + 'https://flink.apache.org', + 'https://spark.apache.org', + 'https://trino.io', + 'https://hadoop.apache.org', + 'https://stackoverflow.com', + 'https://docs.python.org', + 'https://azure.microsoft.com/products/category/storage', + '/azure/hdinsight/hdinsight-overview', + 'https://azure.microsoft.com/products/category/storage' +] + +def main(): + while True: + if random.randrange(13) < 4: + url = random.choice(web_set[:3]) + else: + url = random.choice(web_set) + + log_entry = { + 'userName': random.choice(user_set), + 'visitURL': url, + 'ts': datetime.now().strftime("%m/%d/%Y %H:%M:%S") + } + + print(json.dumps(log_entry)) + time.sleep(0.05) + +if __name__ == "__main__": + main() +``` + +Use pipeline to produce Kafka topic + +We're going to use click_events for the Kafka topic +``` +python weblog.py \| /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --bootstrap-server wn0-contsk:9092 --topic click_events +``` + +Sample commands on Kafka +``` +-- create topic (replace with your Kafka bootstrap server) +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic click_events --bootstrap-server wn0-contsk:9092 + +-- delete topic (replace with your Kafka bootstrap server) +/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --delete --topic click_events --bootstrap-server wn0-contsk:9092 + +-- produce topic (replace with your Kafka bootstrap server) +python weblog.py \| /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --bootstrap-server wn0-contsk:9092 --topic click_events + +-- consume topic +/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server wn0-contsk:9092 --topic click_events --from-beginning +{"userName": "Luke", "visitURL": "https://azure.microsoft.com/products/category/storage", "ts": "07/11/2023 06:39:43"} +{"userName": "Sean", "visitURL": "https://www.bing.com/new", "ts": "07/11/2023 06:39:43"} +{"userName": "XiaoMing", "visitURL": "https://hbase.apache.org", "ts": "07/11/2023 06:39:43"} +{"userName": "Machael", "visitURL": "https://www.bing.com/new", "ts": "07/11/2023 06:39:43"} +{"userName": "Andrew", "visitURL": "https://github.com", "ts": "07/11/2023 06:39:43"} +{"userName": "Zark", "visitURL": "https://kafka.apache.org", "ts": "07/11/2023 06:39:43"} +{"userName": "XiaoMing", "visitURL": "https://trino.io", "ts": "07/11/2023 06:39:43"} +{"userName": "Zark", "visitURL": "https://flink.apache.org", "ts": "07/11/2023 06:39:43"} +{"userName": "Mike", "visitURL": "https://kafka.apache.org", "ts": "07/11/2023 06:39:43"} +{"userName": "Zark", "visitURL": "https://docs.python.org", "ts": "07/11/2023 06:39:44"} +{"userName": "John", "visitURL": "https://www.bing.com/new", "ts": "07/11/2023 06:39:44"} +{"userName": "Mike", "visitURL": "https://hadoop.apache.org", "ts": "07/11/2023 06:39:44"} +{"userName": "Tim", "visitURL": "https://www.bing.com/new", "ts": "07/11/2023 06:39:44"} +..... +``` + +Create HBase table on HDInsight HBase + +``` sql +root@hn0-contos:/home/sshuser# hbase shell +SLF4J: Class path contains multiple SLF4J bindings. +SLF4J: Found binding in [jar:file:/usr/hdp/5.1.1.3/hadoop/lib/slf4j-reload4j-1.7.35.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: Found binding in [jar:file:/usr/hdp/5.1.1.3/hbase/lib/client-facing-thirdparty/slf4j-reload4j-1.7.33.jar!/org/slf4j/impl/StaticLoggerBinder.class] +SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. +SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] +HBase Shell +Use "help" to get list of supported commands. +Use "exit" to quit this interactive shell. +For more information, see, http://hbase.apache.org/2.0/book.html#shell +Version 2.4.11.5.1.1.3, rUnknown, Thu Apr 20 12:31:07 UTC 2023 +Took 0.0032 seconds +hbase:001:0> create 'user_click_events','user_info' +Created table user_click_events +Took 5.1399 seconds +=> Hbase::Table - user_click_events +hbase:002:0> +``` + +### Develop the project for submitting jar on Flink + +create maven project with following pom.xml + +``` xml +<?xml version="1.0" encoding="UTF-8"?> +<project xmlns="http://maven.apache.org/POM/4.0.0" + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <groupId>contoso.example</groupId> + <artifactId>FlinkHbaseDemo</artifactId> + <version>1.0-SNAPSHOT</version> + <properties> + <maven.compiler.source>1.8</maven.compiler.source> + <maven.compiler.target>1.8</maven.compiler.target> + <flink.version>1.16.0</flink.version> + <java.version>1.8</java.version> + <scala.binary.version>2.12</scala.binary.version> + <hbase.version>2.4.11</hbase.version> + <kafka.version>3.2.0</kafka.version> // Replace with 2.4.0 for HDInsight Kafka 2.4 + </properties> + <dependencies> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-streaming-java</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-clients</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-hbase-base --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-hbase-base</artifactId> + <version>1.16.0</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client --> + <dependency> + <groupId>org.apache.hbase</groupId> + <artifactId>hbase-client</artifactId> + <version>${hbase.version}</version> + </dependency> + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-common</artifactId> + <version>3.1.1</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-kafka</artifactId> + <version>${flink.version}</version> + </dependency> + <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-base --> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-base</artifactId> + <version>${flink.version}</version> + </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-core</artifactId> + <version>${flink.version}</version> + </dependency> + </dependencies> + <build> + <plugins> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-assembly-plugin</artifactId> + <version>3.0.0</version> + <configuration> + <appendAssemblyId>false</appendAssemblyId> + <descriptorRefs> + <descriptorRef>jar-with-dependencies</descriptorRef> + </descriptorRefs> + </configuration> + <executions> + <execution> + <id>make-assembly</id> + <phase>package</phase> + <goals> + <goal>single</goal> + </goals> + </execution> + </executions> + </plugin> + </plugins> + </build> +</project> +``` + +Source code + +Writing HBase Sink program + +HBaseWriterSink +``` java +package contoso.example; + +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.; +import org.apache.hadoop.hbase.util.Bytes; + +public class HBaseWriterSink extends RichSinkFunction<Tuple3<String,String,String>> { + String hbase_zk = "<update-hbasezk-ip>:2181,<update-hbasezk-ip>:2181,<update-hbasezk-ip>:2181"; + Connection hbase_conn; + Table tb; + int i = 0; + @Override + public void open(Configuration parameters) throws Exception { + super.open(parameters); + org.apache.hadoop.conf.Configuration hbase_conf = HBaseConfiguration.create(); + hbase_conf.set("hbase.zookeeper.quorum", hbase_zk); + hbase_conf.set("zookeeper.znode.parent", "/hbase-unsecure"); + hbase_conn = ConnectionFactory.createConnection(hbase_conf); + tb = hbase_conn.getTable(TableName.valueOf("user_click_events")); + } + + @Override + public void invoke(Tuple3<String,String,String> value, Context context) throws Exception { + byte[] rowKey = Bytes.toBytes(String.format("%010d", i++)); + Put put = new Put(rowKey); + put.addColumn(Bytes.toBytes("user_info"), Bytes.toBytes("userName"), Bytes.toBytes(value.f0)); + put.addColumn(Bytes.toBytes("user_info"), Bytes.toBytes("visitURL"), Bytes.toBytes(value.f1)); + put.addColumn(Bytes.toBytes("user_info"), Bytes.toBytes("ts"), Bytes.toBytes(value.f2)); + tb.put(put); + }; + + public void close() throws Exception { + if (null != tb) tb.close(); + if (null != hbase_conn) hbase_conn.close(); + } +} +``` + +main:KafkaSinkToHbase* + +Writing a Kafka Sink to HBase Program + +``` java +package contoso.example; + +import org.apache.flink.api.common.eventtime.WatermarkStrategy; +import org.apache.flink.api.common.serialization.SimpleStringSchema; +import org.apache.flink.api.common.typeinfo.Types; + +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.connector.kafka.source.KafkaSource; +import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.streaming.api.datastream.DataStreamSource; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + +public class KafkaSinkToHbase { + public static void main(String[] args) throws Exception { + + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1); + String kafka_brokers = "10.0.0.38:9092,10.0.0.39:9092,10.0.0.40:9092"; + + KafkaSource<String> source = KafkaSource.<String>builder() + .setBootstrapServers(kafka_brokers) + .setTopics("click_events") + .setGroupId("my-group") + .setStartingOffsets(OffsetsInitializer.earliest()) + .setValueOnlyDeserializer(new SimpleStringSchema()) + .build(); + + DataStreamSource<String> kafka = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source").setParallelism(1); + DataStream<Tuple3<String,String,String>> dataStream = kafka.map(line-> { + String[] fields = line.toString().replace("{","").replace("}",""). + replace("\"","").split(","); + Tuple3<String, String,String> tuple3 = Tuple3.of(fields[0].substring(10),fields[1].substring(11),fields[2].substring(5)); + return tuple3; + }).returns(Types.TUPLE(Types.STRING,Types.STRING,Types.STRING)); + + dataStream.addSink(new HBaseWriterSink()); + + env.execute("Kafka Sink To Hbase"); + } +} + +``` + +### Submit job on Secure Shell + +We use [Flink CLI](./flink-web-ssh-on-portal-to-flink-sql.md) from Azure portal to submit jobs ++ +### Monitor job on Flink UI + +We can monitor the jobs on Flink Web UI ++ +## Validate HBase table data + +``` +hbase:001:0> scan 'user_click_events' +ROW COLUMN+CELL + 0000000853 column=user_info:ts, timestamp=2023-07-11T06:50:08.505, value=07/11/2023 06:39:44 + 0000000853 column=user_info:userName, timestamp=2023-07-11T06:50:08.505, value=Sean + 0000000853 column=user_info:visitURL, timestamp=2023-07-11T06:50:08.505, value=https://kafka.apache.org + 0000000854 column=user_info:ts, timestamp=2023-07-11T06:50:08.556, value=07/11/2023 06:39:45 + 0000000854 column=user_info:userName, timestamp=2023-07-11T06:50:08.556, value=Pick + 0000000854 column=user_info:visitURL, timestamp=2023-07-11T06:50:08.556, value=https://www.bing.com/new + 0000000855 column=user_info:ts, timestamp=2023-07-11T06:50:08.609, value=07/11/2023 06:39:45 + 0000000855 column=user_info:userName, timestamp=2023-07-11T06:50:08.609, value=Pick + 0000000855 column=user_info:visitURL, timestamp=2023-07-11T06:50:08.609, value=https://kafka.apache.org + 0000000856 column=user_info:ts, timestamp=2023-07-11T06:50:08.663, value=07/11/2023 06:39:45 + 0000000856 column=user_info:userName, timestamp=2023-07-11T06:50:08.663, value=Andrew + 0000000856 column=user_info:visitURL, timestamp=2023-07-11T06:50:08.663, value=https://hadoop.apache.org + 0000000857 column=user_info:ts, timestamp=2023-07-11T06:50:08.714, value=07/11/2023 06:39:45 + 0000000857 column=user_info:userName, timestamp=2023-07-11T06:50:08.714, value=Machael + 0000000857 column=user_info:visitURL, timestamp=2023-07-11T06:50:08.714, value=https://flink.apache.org + 0000000858 column=user_info:ts, timestamp=2023-07-11T06:50:08.767, value=07/11/2023 06:39:45 + 0000000858 column=user_info:userName, timestamp=2023-07-11T06:50:08.767, value=Luke + 0000000858 column=user_info:visitURL, timestamp=2023-07-11T06:50:08.767, value=/azure/ + hdinsight/hdinsight-overview +859 row(s) +Took 0.9531 seconds +``` + +> [!NOTE] +> - FlinkKafkaConsumer is deprecated and removed with Flink 1.17, use KafkaSource instead. +> - FlinkKafkaProducer is deprecated and removed with Flink 1.15, use KafkaSink instead. + +## References +* [Apache Kafka Connector](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka) +* [Download IntelliJ IDEA](https://www.jetbrains.com/idea/download/#section=windows)
hdinsight-aks	Use Hive Catalog	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/use-hive-catalog.md	+ + Title: Use Hive Catalog, Hive Read & Write demo on Apache Flink SQL +description: Learn how to use Hive Catalog, Hive Read & Write demo on Apache Flink SQL ++ Last updated : 08/29/2023++ +# How to use Hive Catalog with Apache Flink SQL ++ +This example uses HiveΓÇÖs Metastore as a persistent catalog with Apache FlinkΓÇÖs HiveCatalog. We will use this functionality for storing Kafka table and MySQL table metadata on Flink across sessions. Flink uses Kafka table registered in Hive Catalog as a source, perform some lookup and sink result to MySQL database ++ +## Prerequisites + +* [HDInsight on AKS Flink 1.16.0 with Hive Metastore 3.1.2](../flink/flink-create-cluster-portal.md) +* [HDInsight Kafka](../../hdinsight/kafk) + * You're required to ensure the network settings are complete as described on [Using HDInsight Kafka](../flink/process-and-consume-data.md); that's to make sure HDInsight on AKS Flink and HDInsight Kafka are in the same VNet +* MySQL 8.0.33 + +## Apache Hive on Flink + +Flink offers a two-fold integration with Hive. + +- The first step is to use Hive Metastore (HMS) as a persistent catalog with FlinkΓÇÖs HiveCatalog for storing Flink specific metadata across sessions. + - For example, users can store their Kafka or ElasticSearch tables in Hive Metastore by using HiveCatalog, and reuse them later on in SQL queries. +- The second is to offer Flink as an alternative engine for reading and writing Hive tables. +- The HiveCatalog is designed to be ΓÇ£out of the boxΓÇ¥ compatible with existing Hive installations. You don't need to modify your existing Hive Metastore or change the data placement or partitioning of your tables. + +You may refer to this page for more details on [Apache Hive](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/hive/overview/) + +## Environment preparation + +### Create an Apache Flink cluster with HMS + +Lets create an Apache Flink cluster with HMS on Azure portal, you can refer to the detailed instructions on [Flink cluster creation](../flink/flink-create-cluster-portal.md). ++ +After cluster creation, check HMS is running or not on AKS side. ++ +### Prepare user order transaction data Kafka topic on HDInsight + +Download the kafka client jar using the following command: + +`wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.12-3.2.0.tgz` + +Untar the tar file with + +`tar -xvf kafka_2.12-3.2.0.tgz` + +Produce the messages to the Kafka topic. ++ +Other commands: +> [!NOTE] +> You're required to replace bootstrap-server with your own kafka brokers host name or IP +``` + delete topic +./kafka-topics.sh --delete --topic user_orders --bootstrap-server wn0-contsk:9092 + + create topic +./kafka-topics.sh --create --replication-factor 2 --partitions 3 --topic user_orders --bootstrap-server wn0-contsk:9092 + + produce topic +./kafka-console-producer.sh --bootstrap-server wn0-contsk:9092 --topic user_orders + + consumer topic +./kafka-console-consumer.sh --bootstrap-server wn0-contsk:9092 --topic user_orders --from-beginning +``` + +### Prepare user order master data on MySQL on Azure + +Testing DB: +++ +Prepare the order table: + +``` SQL +mysql> use mydb +Reading table information for completion of table and column names +You can turn off this feature to get a quicker startup with -A + +mysql> CREATE TABLE orders ( + order_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, + order_date DATETIME NOT NULL, + customer_id INTEGER NOT NULL, + customer_name VARCHAR(255) NOT NULL, + price DECIMAL(10, 5) NOT NULL, + product_id INTEGER NOT NULL, + order_status BOOLEAN NOT NULL +) AUTO_INCREMENT = 10001; ++ +mysql> INSERT INTO orders +VALUES (default, '2023-07-16 10:08:22','0001', 'Jark', 50.00, 102, false), + (default, '2023-07-16 10:11:09','0002', 'Sally', 15.00, 105, false), + (default, '2023-07-16 10:11:09','000', 'Sally', 25.00, 105, false), + (default, '2023-07-16 10:11:09','0004', 'Sally', 45.00, 105, false), + (default, '2023-07-16 10:11:09','0005', 'Sally', 35.00, 105, false), + (default, '2023-07-16 12:00:30','0006', 'Edward', 90.00, 106, false); + +mysql> select * from orders; ++-++-++-++--+ +\| order_id \| order_date \| customer_id \| customer_name \| price \| product_id \| order_status \| ++-++-++-++--+ +\| 10001 \| 2023-07-16 10:08:22 \| 1 \| Jark \| 50.00000 \| 102 \| 0 \| +\| 10002 \| 2023-07-16 10:11:09 \| 2 \| Sally \| 15.00000 \| 105 \| 0 \| +\| 10003 \| 2023-07-16 10:11:09 \| 3 \| Sally \| 25.00000 \| 105 \| 0 \| +\| 10004 \| 2023-07-16 10:11:09 \| 4 \| Sally \| 45.00000 \| 105 \| 0 \| +\| 10005 \| 2023-07-16 10:11:09 \| 5 \| Sally \| 35.00000 \| 105 \| 0 \| +\| 10006 \| 2023-07-16 12:00:30 \| 6 \| Edward \| 90.00000 \| 106 \| 0 \| ++-++-++-++--+ +6 rows in set (0.22 sec) + +mysql> desc orders; +++++--++-+ +\| Field \| Type \| Null \| Key \| Default \| Extra \| +++++--++-+ +\| order_id \| int \| NO \| PRI \| NULL \| auto_increment \| +\| order_date \| datetime \| NO \| \| NULL \| \| +\| customer_id \| int \| NO \| \| NULL \| \| +\| customer_name \| varchar(255) \| NO \| \| NULL \| \| +\| price \| decimal(10,5) \| NO \| \| NULL \| \| +\| product_id \| int \| NO \| \| NULL \| \| +\| order_status \| tinyint(1) \| NO \| \| NULL \| \| +++++--++-+ +7 rows in set (0.22 sec) +``` + +### Using SSH download required Kafka connector and MySQL Database jars ++ +> [!NOTE] +> Download the correct version jar according to our HDInsight kafka version and MySQL version. + +``` +wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-jdbc/1.16.0/flink-connector-jdbc-1.16.0.jar +wget https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.33/mysql-connector-j-8.0.33.jar +wget https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/3.2.0/kafka-clients-3.2.0.jar +wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-kafka/1.16.0/flink-connector-kafka-1.16.0.jar +``` + +Moving the planner jar + +Move the jar flink-table-planner_2.12-1.16.0-0.0.18.jar located in webssh pod's /opt to /lib and move out the jar flink-table-planner-loader-1.16.0-0.0.18.jar from /lib. Please refer to [issue](https://issues.apache.org/jira/browse/FLINK-25128) for more details. Perform the following steps to move the planner jar. + +``` +mv /opt/flink-webssh/opt/flink-table-planner_2.12-1.16.0-0.0.18.jar /opt/flink-webssh/lib/ +mv /opt/flink-webssh/lib/flink-table-planner-loader-1.16.0-0.0.18.jar /opt/flink-webssh/opt/ +``` + +> [!NOTE] +> An extra planner jar moving is only needed when using Hive dialect or HiveServer2 endpoint. However, this is the recommended setup for Hive integration. + +## Validation +### Use bin/sql-client.sh to connect to Flink SQL + +``` +bin/sql-client.sh -j kafka-clients-3.2.0.jar -j flink-connector-kafka-1.16.0.jar -j flink-connector-jdbc-1.16.0.jar -j mysql-connector-j-8.0.33.jar +``` + +### Create Hive catalog and connect to the hive catalog on Flink SQL + +> [!NOTE] +> As we already use Flink cluster with Hive Metastore, there is no need to perform any additional configurations. + +``` SQL +CREATE CATALOG myhive WITH ( + 'type' = 'hive' +); + +USE CATALOG myhive; +``` + +### Create Kafka Table on Apache Flink SQL + +``` SQL +CREATE TABLE kafka_user_orders ( + `user_id` BIGINT, + `user_name` STRING, + `user_email` STRING, + `order_date` TIMESTAMP(3) METADATA FROM 'timestamp', + `price` DECIMAL(10,5), + `product_id` BIGINT, + `order_status` BOOLEAN +) WITH ( + 'connector' = 'kafka', + 'topic' = 'user_orders', + 'scan.startup.mode' = 'latest-offset', + 'properties.bootstrap.servers' = '10.0.0.38:9092,10.0.0.39:9092,10.0.0.40:9092', + 'format' = 'json' +); + +select * from kafka_user_orders; +``` + +### Create MySQL Table on Apache Flink SQL + +``` SQL +CREATE TABLE mysql_user_orders ( + `order_id` INT, + `order_date` TIMESTAMP, + `customer_id` INT, + `customer_name` STRING, + `price` DECIMAL(10,5), + `product_id` INT, + `order_status` BOOLEAN +) WITH ( + 'connector' = 'jdbc', + 'url' = 'jdbc:mysql://<servername>.mysql.database.azure.com/mydb', + 'table-name' = 'orders', + 'username' = '<username>', + 'password' = '<password>' +); + +select * from mysql_user_orders; +``` + +### Check tables registered in above Hive catalog on Flink SQL +++ +### Sink user transaction order info into master order table in MySQL on Flink SQL + +``` SQL +INSERT INTO mysql_user_orders (order_date, customer_id, customer_name, price, product_id, order_status) + SELECT order_date, CAST(user_id AS INT), user_name, price, CAST(product_id AS INT), order_status + FROM kafka_user_orders; +``` ++ +### Check if user transaction order data on Kafka is added in master table order in MySQL on Azure Cloud Shell +++ +### Creating three more user orders on Kafka + +``` +sshuser@hn0-contsk:~$ /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --bootstrap-server wn0-contsk:9092 --topic user_orders +>{"user_id": null,"user_name": "Lucy","user_email": "user8@example.com","order_date": "07/17/2023 21:33:44","price": "90.00000","product_id": "102","order_status": false} +>{"user_id": "0009","user_name": "Zark","user_email": "user9@example.com","order_date": "07/17/2023 21:52:07","price": "80.00000","product_id": "103","order_status": true} +>{"user_id": "0010","user_name": "Alex","user_email": "user10@example.com","order_date": "07/17/2023 21:52:07","price": "70.00000","product_id": "104","order_status": true} +``` + +### Check Kafka table data on Flink SQL +``` SQL +Flink SQL> select * from kafka_user_orders; +``` ++ +### Insert `product_id=104` into orders table on MySQL on Flink SQL + +``` SQL +INSERT INTO mysql_user_orders (order_date, customer_id, customer_name, price, product_id, order_status) +SELECT order_date, CAST(user_id AS INT), user_name, price, CAST(product_id AS INT), order_status +FROM kafka_user_orders where product_id = 104; +``` + +### Check `product_id = 104` record is added in order table on MySQL on Azure Cloud Shell ++ +### Reference +* [Apache Hive](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/hive/overview/)
hdinsight-aks	Use Hive Metastore Datastream	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/flink/use-hive-metastore-datastream.md	+ + Title: Use Hive Metastore with Apache Flink DataStream API +description: Use Hive Metastore with Apache Flink DataStream API ++ Last updated : 08/29/2023++ +# Use Hive Metastore with Apache Flink DataStream API ++ +Over the years, Hive Metastore has evolved into a de facto metadata center in the Hadoop ecosystem. Many companies have a separate Hive Metastore service instance in their production environments to manage all their metadata (Hive or non-Hive metadata). For users who have both Hive and Flink deployments, HiveCatalog enables them to use Hive Metastore to manage FlinkΓÇÖs metadata. ++ +## Supported Hive versions for HDInsight on AKS - Apache Flink + +Supported Hive Version: +- 3.1 + - 3.1.0 + - 3.1.1 + - 3.1.2 + - 3.1.3 + +If you're building your own program, you need the following dependencies in your mvn file. ItΓÇÖs not recommended to include these dependencies in the resulting jar file. YouΓÇÖre supposed to add dependencies at runtime. + +``` +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId> flink-sql-connector-hive-3.1.2_2.12</artifactId> + <version>1.16.0</version> + <scope>provided</scope> +</dependency> + +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table-api-java-bridge_2.12</artifactId> + <version>1.16.0</version> + <scope>provided</scope> +</dependency> +``` + +## Connect to Hive + +This example illustrates various snippets of connecting to hive, using HDInsight on AKS - Flink, you're required to use `/opt/hive-conf` as hive configuration directory to connect with Hive metastore + +``` +public static void main(String[] args) throws Exception + { + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + // start Table Environment + StreamTableEnvironment tableEnv = + StreamTableEnvironment.create(env); + env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); + String catalogName = "myhive"; + String defaultDatabase = HiveCatalog.DEFAULT_DB; + String hiveConfDir = "/opt/hive-conf"; + HiveCatalog hive = new HiveCatalog(catalogName, defaultDatabase, hiveConfDir); +// register HiveCatalog in the tableEnv + tableEnv.registerCatalog("myhive", hive); +// set the HiveCatalog as the current catalog of the session + tableEnv.useCatalog("myhive"); +// Create a table in hive catalog + tableEnv.executeSql("create table MyTable (a int, b bigint, c varchar(32)) with ('connector' = 'filesystem', 'path' = '/non', 'format' = 'csv')\""); +// Create a view in hive catalog + tableEnv.executeSql("create view MyView as select * from MyTable"); +``` + +## References +[Apache Flink - Hive read & write](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/hive/hive_read_write/)
hdinsight-aks	Get Started	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/get-started.md	+ + Title: One-click deployment for Azure HDInsight on AKS (Preview) +description: How to create cluster pool and cluster with one-click deployment on Azure HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Get started with one-click deployment (Preview) ++ +One-click deployments are designed for users to experience zero touch creation of HDInsight on AKS. It eliminates the need to manually perform certain steps. +This article describes how to use readily available ARM templates to create a cluster pool and cluster in few clicks. + +> [!NOTE] +> - These ARM templates cover the basic requirements to create a cluster pool and cluster along with prerequisite resources. To explore advanced options, see [Create cluster pool and clusters](quickstart-create-cluster.md). +> - Necessary resources are created as part of the ARM template deployment in your resource group. For more information, see [Resource prerequisites](prerequisites-resources.md). +> - The user must have permission to create new resources and assign roles to the resources in the subscription to deploy these ARM templates. +> - Before you begin with ARM templates, please keep [object ID ready](#find-object-id-of-an-identity) for the identity you are going to use for deployment. + +\|Workload\|Template\|Description\| +\|\|\|\| +\|Trino\| [![Deploy Trino to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FoneClickTrino.json) \| Creates cluster pool and cluster without HMS, custom VNet, and Monitoring capability.\| +\|Flink\|[![Deploy Trino to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FoneClickFlink.json) \| Creates cluster pool and cluster without HMS, custom VNet, and Monitoring capability.\| +\|Spark\| [![Deploy Trino to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FoneClickSpark.json) \| Creates cluster pool and cluster without HMS, custom VNet, and Monitoring capability.\| +\|Trino\|[![Deploy Trino to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FoneClickTrino_WithVnet.json) \| Creates cluster pool and cluster with an existing custom VNet.\| +\|Flink\| [![Deploy Trino to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FoneClickFlink_WithVnet.json) \| Creates cluster pool and cluster with an existing custom VNet.\| +\|Spark\| [![Deploy Trino to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FoneClickSpark_WithVnet.json)\| Creates cluster pool and cluster with an existing custom VNet.\| + +When you click on one of these templates, it launches Custom deployment page in the Azure portal. You need to provide the details for the following parameters based on the template used. + +\|Property\|Description\| +\|\|\| +\|Subscription\| Select the Azure subscription in which resources are to be created.\| +\|Resource Group\|Create a new resource group, or select the resource group in your subscription from the drop-down list under which resources are to be created.\| +\|Region\|Select the region where the resource group is deployed.\| +\|Cluster Pool Name\| Enter the name of the cluster pool to be created. Cluster pool name length can't be more than 26 characters. It must start with an alphabet, end with an alphanumeric character, and must only contain alphanumeric characters and hyphens.\| +\|Cluster Pool Version\| Select the HDInsight on AKS cluster pool version. \| +\|Cluster Pool Node VM Size\|From the drop-down list, select the virtual machine size for the cluster pool based on your requirement.\| +\|Location\|Select the region where the cluster and necessary resources are to be deployed.\| +\|Resource Prefix\|Provide a prefix for creating necessary resources for cluster creation, resources are named as [prefix + predefined string].\| +\|Cluster Name \|Enter the name of the new cluster.\| +\|HDInsight on AKS Version \| Select the minor or patch version of the HDInsight on AKS of the new cluster. For more information, see [versioning](./versions.md).\| +\|Cluster Node VM Size \|Provide the VM size for the cluster. For example: Standard_D8ds_v5.\| +\|Cluster OSS Version \|Provide the cluster type supported OSS version in three part naming format. For example: Trino - 0.410.0, Flink - 1.16.0, Spark - 3.3.1\| +\|Custom VNet Name \|Provide custom virtual network to be associated with the cluster pool. It should be in the same resource group as your cluster pool. \| +\|Subnet Name in Custom Vnet \|Provide subnet name defined in your custom virtual network. \| +\|User Object ID\| Provide user alias object ID from Microsoft Entra ID [(Azure Active Directory)](https://www.microsoft.com/security/business/identity-access/azure-active-directory).\| + + ### Find Object ID of an identity + + 1. In the top search bar in the Azure portal, enter your user ID. (For example, john@contoso.com) + + :::image type="content" source="./media/get-started/search-object-id.png" alt-text="Screenshot showing how to search object ID."::: + + 2. From Azure Active Directory box, click on your user ID. + + :::image type="content" source="./media/get-started/view-object-id.png" alt-text="Screenshot showing how to view object ID."::: + + 1. Copy the Object ID. + + ### Deploy + + 1. Select Next: Review + create to continue. + 1. On the Review + create page, based on validation status, continue to click Create. + + :::image type="content" source="./media/get-started/custom-deployment-summary.png" alt-text="Screenshot showing custom deployment summary."::: + + The Deployment is in progress page is displayed while the resources are getting created, and the "Your deployment is complete" page is displayed once the cluster pool and cluster are fully deployed and ready for use. + + :::image type="content" source="./media/get-started/custom-deployment-complete.png" alt-text="Screenshot showing custom deployment complete."::: + + + + If you navigate away from the page, you can check the status of the deployment by clicking Notifications :::image type="icon" source="./media/get-started/notifications.png" alt-text="Screenshot showing notifications icon in the Azure portal."::: in the Azure portal. + + > [!TIP] + > + > For troubleshooting any deployment errors, you can refer this [page](./create-cluster-error-dictionary.md). +
hdinsight-aks	Hdinsight Aks Support Help	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/hdinsight-aks-support-help.md	+ + Title: Support and troubleshooting for HDInsight on AKS +description: This article provides support and troubleshooting options for HDInsight on AKS. ++ Last updated : 10/06/2023++ +# Support and troubleshooting for HDInsight on AKS + +## Self help troubleshooting ++ +The [HDInsight on AKS troubleshooting documentation](./create-cluster-error-dictionary.md) provides guidance for how to diagnose and resolve issues that you might encounter when using HDInsight on AKS. These articles cover how to troubleshoot deployment failures, connection issues, and more. + +For specific component pages, you can always refer: + +- [Flink](./flink/flink-cluster-configuration.md) +- [Trino](./trino/trino-configuration-troubleshoot.md) + +## Post a question on Microsoft Q&A ++ +Azure's preferred destination for community support, [Microsoft Q&A](/answers/products/azure), allows you to ask technical questions and engage with Azure engineers, Most Valuable Professionals (MVPs), partners, and customers. When you ask a question, make sure you use the `HDInsight` tag. You can also submit your own answers and help other community members with their questions. + +- [Microsoft Q&A for HDInsight on AKS](/answers/tags/453/azure-hdinsight-aks) + +If you can't find an answer to your problem using search, you can submit a new question to Microsoft Q&A and tag it with the appropriate Azure service and area. + +The following table lists the tags for HDInsight on AKS and related + +\| Area \| Tag \| +\|-\|-\| +\| [Azure Kubernetes Service](/azure/aks/intro-kubernetes) \| [azure-kubernetes-service](/answers/topics/azure-kubernetes-service.html)\| +\| [Azure HDInsight on AKS](./overview.md) \| [azure-hdinsight-aks](/answers/topics/azure-hdinsight-aks.html) \| +\| [Azure storage accounts](/azure/storage/common/storage-account-overview) \| [azure-storage-accounts](/answers/topics/azure-storage-accounts.html)\| +\| [Azure Managed Identities](/azure/active-directory/managed-identities-azure-resources/overview) \| [azure-managed-identity](/answers/topics/azure-managed-identity.html) \| +\| [Azure RBAC](/azure/role-based-access-control/overview) \| [azure-rbac](/answers/topics/azure-rbac.html)\| +\| [Azure Active Directory](/azure/active-directory/fundamentals/whatis) \| [azure-active-directory](/answers/topics/azure-active-directory.html)\| +\| [Azure Virtual Network](/azure/virtual-network/network-overview) \| [azure-virtual-network](/answers/topics/azure-virtual-network.html)\| + +## Create an Azure support request ++ +Explore the range of [Azure support options](https://azure.microsoft.com/support/plans) and choose a plan that best fits your needs. Azure customers can create and manage support requests in the Azure portal. + +If you already have an Azure Support Plan, you can [open a support request](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest). + +## Create a GitHub issue ++ +If you need help with the languages and tools for developing and managing HDInsight on AKS, you can open an issue in its GitHub repository. + +The following table lists the GitHub repositories for HDInsight on AKS and related + +\| Library \| GitHub issues URL\| +\| \| \| +\| Azure PowerShell \| https://github.com/Azure/azure-powershell/issues \| +\| Azure CLI \| https://github.com/Azure/azure-cli/issues \| +\| Azure REST API \| https://github.com/Azure/azure-rest-api-specs/issues \| +\| Azure SDK for Java \| https://github.com/Azure/azure-sdk-for-java/issues \| +\| Azure SDK for Python \| https://github.com/Azure/azure-sdk-for-python/issues \| +\| Azure SDK for .NET \| https://github.com/Azure/azure-sdk-for-net/issues \| +\| Azure SDK for JavaScript \| https://github.com/Azure/azure-sdk-for-js/issues \| +\| Terraform \| https://github.com/Azure/terraform/issues \| + +## Stay informed of updates and new releases ++ +Learn about important product updates, roadmap, and announcements in [Azure Updates](https://azure.microsoft.com/updates/?query=HDInsight), [Release notes](./release-notes/hdinsight-aks-release-notes.md) and [Social Channels](https://www.linkedin.com/groups/14313521/). + +## Next steps + +Visit the [HDInsight on AKS documentation](./index.yml).
hdinsight-aks	Hdinsight On Aks Autoscale Clusters	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/hdinsight-on-aks-autoscale-clusters.md	+ + Title: Automatically scale Azure HDInsight on AKS clusters +description: Use the Auto scale feature to automatically scale Azure HDInsight clusters on AKS based on a schedule or load based metrics. ++ Last updated : 08/29/2023++ +# Auto Scale HDInsight on AKS Clusters ++ +The sizing of any cluster to meet job performance and manage costs ahead of time is always tricky, and hard to determine! One of the lucrative benefits of building data lake house over Cloud is its elasticity, which means to use autoscale feature to maximize the utilization of resources at hand. Auto scale with Kubernetes is one key to establishing a cost optimized ecosystem. With varied usage patterns in any enterprise, there could be variations in cluster loads over time that could lead to clusters being under-provisioned (lousy performance) or overprovisioned (unnecessary costs due to idle resources). + +The autoscale feature offered in HDInsight on AKS can automatically increase or decrease the number of worker nodes in your cluster. Auto scale uses the cluster metrics and scaling policy used by the customers. + +This feature is well suited for mission-critical workloads, which may have +- Variable or unpredictable traffic patterns and require SLAs on high performance and scale or +- Predetermined schedule for required worker nodes to be available to successfully execute the jobs on the cluster. + +Auto Scale with HDInsight on AKS Clusters makes the clusters cost efficient, and elastic on Azure. + +With Auto scale, customers can scale down clusters without affecting workloads. It's enabled with advanced capabilities such as graceful decommissioning and cooling period. These capabilities empower users to make informed choices on addition and removal of nodes based on the current load of the cluster. + +## How it works + +This feature works by scaling the number of nodes within preset limits based on cluster metrics or a defined schedule of scale-up and scale-down operations. There are two types of conditions to trigger autoscale events: threshold-based triggers for various cluster performance metrics (called load-based scaling) and time-based triggers (called schedule-based scaling). + +Load-based scaling changes the number of nodes in your cluster, within a range that you set, to ensure optimal CPU usage and minimize running cost. + +Schedule-based scaling changes the number of nodes in your cluster based on a schedule of scale-up and scale-down operations. + +> [!NOTE] +> Auto scale does not support changing SKU type of an existing cluster. + +### Cluster compatibility + +The following table describes the cluster types that are compatible with the Auto scale feature, and whatΓÇÖs available or planned. + +\|Workload \|Load Based \|Schedule Base\| +\|-\|-\|-\| +\|Flink \|Planned \|Yes\| +\|Trino \|Planned \|Yes\| +\|Spark \|Yes \|Yes\| + +Graceful decommissioning is configurable. + +## Scaling Methods + +* Schedule-based scaling: + * When your jobs are expected to run on fixed schedules and for a predictable duration or when you anticipate low usage during specific times of the day For example, test and dev environments in post-work hours, end-of day jobs. + + :::image type="content" source="./media/hdinsight-on-aks-autoscale-clusters/schedule-based-concept-step-1.png" alt-text="Screenshot showing how to select schedule-based-scaling." border="true" lightbox="./media/hdinsight-on-aks-autoscale-clusters/schedule-based-concept-step-1.png"::: + +* Load based scale: + * When the load patterns fluctuate substantially and unpredictably during the day, for example, Order data processing with random fluctuations in load patterns based on various factors. + + :::image type="content" source="./media/hdinsight-on-aks-autoscale-clusters/load-based-concept-step-2.png" alt-text="Screenshot showing how to select load based scaling." border="true" lightbox="./media/hdinsight-on-aks-autoscale-clusters/load-based-concept-step-2.png"::: + + With the new, configure scale rule option, you can now customize the scale rules. + + :::image type="content" source="./media/hdinsight-on-aks-autoscale-clusters/configure-scale-rule-concept-step-3.png" alt-text="Screenshot showing how to configure scale rule in load based scaling."::: + + :::image type="content" source="./media/hdinsight-on-aks-autoscale-clusters/configure-scale-rule-add-rule-concept-step-4.png" alt-text="Screenshot showing how to add rules in configure scale rules for load based scaling."::: + + > [!TIP] + > * Scale Up rules take precedence when one or more rules are triggered. Even if only one of the rules for scale up suggest cluster being under-provisioned, cluster will try to scale up. For scale down to happen, no scale up rule should be triggered. + +### Load-based scale conditions + +When the following conditions are detected, Auto scale issues a scale request + +\|Scale-up\|Scale-down\| +\|-\|-\| +\|Allocated Cores are greater than 80% for 5-minutes poll interval (1-minute check period)\|Allocated Cores are less than or equal to 20% for 5-minutes poll interval (1-minute check period) \| + +* For scale-up, Auto scale issues a scale-up request to add the required number of nodes. The scale-up is based on how many new worker nodes are needed to meet the current CPU and memory requirements. This value is capped to maximum number of worker nodes set. + +* For scale-down, Auto scale issues a request to remove some nodes. The scale-down considerations include the number of pods per node, the current CPU and memory requirements, and worker nodes, which are candidates for removal based on current job execution. The scale down operation first decommissions the nodes, and then removes them from the cluster. + + > [!IMPORTANT] + > The Auto scale Rule Engine proactively flushes old events every 30 minutes to optimize system memory. As a result, there exists an upper bound limit of 30 minutes on the scaling rule interval. To ensure the consistent and reliable triggering of scaling actions, it's imperative to set the scaling rule interval to a value which is lesser than the limit. By adhering to this guideline, you can guarantee a smooth and efficient scaling process while effectively managing system resources. + +#### Cluster metrics + +Auto scale continuously monitors the cluster and collects the following metrics for Load based autoscale: + +Cluster Metrics Available for Scaling Purposes + +\|Metric\|Description\| +\|-\|-\| +\|Available Cores Percentage\|The total number of cores available in the cluster compared to the total number of cores in the cluster.\| +\|Available Memory Percentage\|The total memory (in MB) available in the cluster compared to the total amount of memory in the cluster.\| +\|Allocated Cores Percentage\|The total number of cores allocated in the cluster compared to the total number of cores in the cluster.\| +\|Allocated Memory Percentage\|The amount of memory allocated in the cluster compared to the total amount of memory in the cluster.\| + +By default, the above metrics are checked every 300 seconds, it is also configurable when you customize the poll interval with customize autoscale option. Auto scale makes scale-up or scale-down decisions based on these metrics. + +> [!NOTE] +> By default Auto scale uses default resource calculator for YARN for Apache Spark. Load based scaling is available for Apache Spark Clusters. ++ +#### Graceful Decommissioning + +Enterprises need ways to achieve petabyte scale with autoscaling and to decommission resources gracefully when they're no longer needed. In such scenario, graceful decommissioning feature comes handy. + +Graceful decommissioning allows jobs to complete even after autoscale has triggered decommissioning of the worker nodes. This feature allows nodes to continue to be provisioned until jobs are complete. + + - Trino : Workers have Graceful Decommission enabled by default. Coordinator allows terminating worker to finish its tasks for configured amount of time before removing the worker from the cluster. You can configure the timeout either using native Trino parameter `shutdown.grace-period`, or on Azure portal service configuration page. + + - Apache Spark : Scaling down may impact/stop any running jobs in the cluster. If you enable Graceful Decommissioning settings on the Azure portal, it incorporates Graceful Decommission of YARN Nodes and ensures that any work in progress on a worker node is complete before the node removed from the HDInsight on AKS cluster. + +##### Cool down period + +To avoid continuous scale up operations, autoscale engine waits for a configurable interval before initiating another set of scale up operations. +The default value is set to 180 seconds + +> [!Note] +> * In custom scale rules, no rule trigger can have a trigger interval greater than 30 minutes. After an auto scaling event occurs, the amount of time to wait before enforcing another scaling policy. +> * Cool down period should be greater than policy interval, so the cluster metrics can get reset. ++ +## Get started + +1. For autoscale to function, you're required to assign the owner or contributor permission to the MSI (used during cluster creation) at the cluster level, using IAM on the left pane. + +1. Refer to the following illustration and steps listed on how to add role assignment + + :::image type="content" source="./media/hdinsight-on-aks-autoscale-clusters/add-permissions-concept-step-5.png" alt-text="Screenshot showing how to add role assignment." border="true" lightbox="./media/hdinsight-on-aks-autoscale-clusters/add-permissions-concept-step-5.png"::: + +1. Select the add role assignment, + 1. Assignment type: Privileged administrator roles + 1. Role: Owner or Contributor + 1. Members: Choose Managed identity and select the User-assigned managed identity, which was given during cluster creation phase. + 1. Assign the role. + +### Create a cluster with Schedule based Auto scale + +1. Once your cluster pool is created, create a [new cluster](./quickstart-create-cluster.md) with your desired workload (on the Cluster type), and complete the other steps as part of the normal cluster creation process. +1. On the Configuration tab, enable Auto scale toggle. +1. Select Schedule based autoscale +1. Select your timezone and then click + Add rule +1. Select the days of the week that the new condition should apply to. +1. Edit the time the condition should take effect and the number of nodes that the cluster should be scaled to. + + :::image type="content" source="./media/hdinsight-on-aks-autoscale-clusters/schedule-based-get-started-step-6.png" alt-text="Screenshot showing how to get started with schedule based autoscale." border="true" lightbox="./media/hdinsight-on-aks-autoscale-clusters/schedule-based-get-started-step-6.png"::: + + > [!NOTE] + > * User should have ΓÇ£ownerΓÇ¥ or ΓÇ£contributorΓÇ¥ role on the cluster MSI for autoscale to work. + > * The default value defines the initial size of the cluster when it's created. + > * The difference between two schedules is set to default by 30 minutes. + > * The time value follows 24-hour format + > * In case of a continuous window of beyond 24 hours across days, you're required to set Auto scale schedule across days, and autoscale assumes 23:59 as 00:00 (with same node count) spanning across two days from 22:00 to 23:59, 00:00 to 02:00 as 22:00 to 02:00. + > * The schedules are set in Coordinated Universal Time (UTC), by default. You can always update to time zone that corresponds to your local time zone in the drop down available. When you are on a time zone that observes Daylight Savings, the schedule does not adjust automatically, you are required to manage the schedule updates accordingly. + +### Create a cluster with Load based Auto scale + +1. Once your cluster pool is created, create a [new cluster](./quickstart-create-cluster.md) with your desired workload (on the Cluster type), and complete the other steps as part of the normal cluster creation process. +1. On the Configuration tab, enable Auto scale toggle. +1. Select Load based autoscale +1. Based on the type of workload, you have options to add graceful decommission timeout, cool down period +1. Select the minimum and maximum nodes, and if necessary configure the scale rules to customize Auto scale to your needs. + + :::image type="content" source="./media/hdinsight-on-aks-autoscale-clusters/load-based-get-started-step-7.png" alt-text="Screenshot showing how to get started with load based autoscale." border="true" lightbox="./media/hdinsight-on-aks-autoscale-clusters/load-based-get-started-step-7.png"::: + + > [!TIP] + > * Your subscription has a capacity quota for each region. The total number of cores of your head nodes and the maximum worker nodes can't exceed the capacity quota. However, this quota is a soft limit; you can always create a support ticket to get it increased easily. + > * If you exceed the total core quota limit, You'll receive an error message saying `The maximum node count you can select is {maxCount} due to the remaining quota in the selected subscription ({remaining} cores)`. + > * Scale Up rules take precedence when one or more rules are triggered. Even if only one of the rules for scale up suggest cluster being under-provisioned, cluster will try to scale up. For scale down to happen, no scale up rule should be triggered. + > * The maximum number of nodes allowed in a cluster pool is 250 in public preview. + +### Create a cluster with a Resource Manager template + +Schedule based auto scale + +You can create an HDInsight on AKS cluster with schedule-based Autoscaling using an Azure Resource Manager template, by adding an autoscale to the clusterProfile -> autoscaleProfile section. + +The autoscale node contains a recurrence that has a timezone and schedule that describes when the change takes place. For a complete Resource Manager template, see sample JSON + +```json +{ + "autoscaleProfile": { + "enabled": true, + "autoscaleType": "ScheduleBased", + "gracefulDecommissionTimeout": 60, + "scheduleBasedConfig": { + "schedules": [ + { + "days": [ + "Monday", + "Tuesday", + "Wednesday" + ], + "startTime": "09:00", + "endTime": "10:00", + "count": 2 + }, + { + "days": [ + "Sunday", + "Saturday" + ], + "startTime": "12:00", + "endTime": "22:00", + "count": 5 + }, + { + "days": [ + "Monday", + "Tuesday", + "Wednesday", + "Thursday", + "Friday" + ], + "startTime": "22:00", + "endTime": "23:59", + "count": 6 + }, + { + "days": [ + "Monday", + "Tuesday", + "Wednesday", + "Thursday", + "Friday" + ], + "startTime": "00:00", + "endTime": "05:00", + "count": 6 + } + ], + "timeZone": "UTC", + "defaultCount": 110 + } + } +} +``` +> [!TIP] +> * You are required to set non-conflicting schedules using ARM deployments, to avoid scaling operation failures. + +Load based auto scale + +You can create an HDInsight on AKS cluster with load-based Autoscaling using an Azure Resource Manager template, by adding an autoscale to the clusterProfile -> autoscaleProfile section. + +The autoscale node contains + +* a poll interval, cool down period, +* graceful decommission, +* minimum and maximum nodes, +* standard threshold rules, +* scaling metrics that describes when the change takes place. + +For a complete Resource Manager template, see sample JSON as follows + +```json + { + "autoscaleProfile": { + "enabled": true, + "autoscaleType": "LoadBased", + "gracefulDecommissionTimeout": 60, + "loadBasedConfig": { + "minNodes": 2, + "maxNodes": 157, + "pollInterval": 300, + "cooldownPeriod": 180, + "scalingRules": [ + { + "actionType": "scaleup", + "comparisonRule": { + "threshold": 80, + "operator": " greaterThanOrEqual" + }, + "evaluationCount": 1, + "scalingMetric": "allocatedCoresPercentage" + }, + { + "actionType": "scaledown", + "comparisonRule": { + "threshold": 20, + "operator": " lessThanOrEqual" + }, + "evaluationCount": 1, + "scalingMetric": "allocatedCoresPercentage" + } + ] + } + } +} +``` + +### Using the REST API +To enable or disable Auto scale on a running cluster using the REST API, make a PATCH request to your Auto scale endpoint: ```https://management.azure.com/subscriptions/{{USER_SUB}}/resourceGroups/{{USER_RG}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSTER_POOL_NAME}}/clusters/{{CLUSTER_NAME}}?api-version={{HILO_API_VERSION}}``` + +- Use the appropriate parameters in the request payload. The json payload could be used to enable Auto scale. +- Use the payload (autoscaleProfile: null) or use flag (enabled, false) to disable Auto scale. +- Refer to the JSON samples mentioned on the above step for reference. + +### Pause Auto scale for a running cluster + +We have introduced pause feature in Auto scale. Now, using the Azure portal, you can pause Auto scale on a running cluster. The below diagram illustrates how to select the pause and resume autoscale ++ +You can resume once you would like to resume the autoscale operations. ++ +> [!TIP] +> When you configure multiple schedules, and you're pausing the autoscale, it doesn't trigger the next schedule. The node count remains same, even if the nodes are in a decommissioned state. + +### Copy Auto Scale Configurations + +Using the Azure portal, you can now copy the same autoscale configurations for a same cluster shape across your cluster pool, you can use this feature and export or import the same configurations. ++ +## Monitoring Auto scale activities + +### Cluster status + +The cluster status listed in the Azure portal can help you monitor Auto scale activities. All of the cluster status messages that you might see are explained in the list. + +\|Cluster status \|Description\| +\|-\|-\| +\|Succeeded \|The cluster is operating normally. All of the previous Auto scale activities have been completed successfully.\| +\|Accepted \|The cluster operation (for example: scale up) is accepted, waiting for the operation to be completed.\| +\|Failed \|This means a current operation failed due to some reason, the cluster maybe not functional. \| +\|Canceled \|The current operation stands canceled.\| ++ +To view the current number of nodes in your cluster, go to the Cluster size chart on the Overview page for your cluster. +++ +### Operation history + +You can view the cluster scale-up and scale-down history as part of the cluster metrics. You can also list all scaling actions over the past day, week, or other period. +++ +Additional resources + +[Manual scale - Azure HDInsight on AKS](./manual-scale.md)
hdinsight-aks	Hdinsight On Aks Manage Authorization Profile	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/hdinsight-on-aks-manage-authorization-profile.md	+ + Title: Manage cluster access +description: How to manage cluster access in HDInsight on AKS ++ Last updated : 08/4/2023++ +# Manage cluster access ++ +This article provides an overview of the mechanisms available to manage access for HDInsight on AKS cluster pools and clusters. +It also covers how to assign permission to users, groups, user-assigned managed identity, and service principals to enable access to cluster data plane. + +When a user creates a cluster, then that user is authorized to perform the operations with data accessible to the cluster. However, to allow other users to execute queries and jobs on the cluster, access to cluster data plane is required. ++ +## Manage cluster pool or cluster access (Control plane) + +The following HDInsight on AKS and Azure built-in roles are available for cluster management to manage the cluster pool or cluster resources. + +\|Role\|Description\| +\|-\|-\| +\|Owner \|Grants full access to manage all resources, including the ability to assign roles in Azure RBAC.\| +\|Contributor \|Grants full access to manage all resources but doesn't allow you to assign roles in Azure RBAC.\| +\|Reader \|View all resources but doesn't allow you to make any changes.\| +\|HDInsight on AKS Cluster Pool Admin \|Grants full access to manage a cluster pool including ability to delete the cluster pool.\| +\|HDInsight on AKS Cluster Admin \|Grants full access to manage a cluster including ability to delete the cluster.\| + +You can use Access control (IAM) blade to manage the access for cluster poolΓÇÖs and control plane. + +Refer: [Grant a user access to Azure resources using the Azure portal - Azure RBAC](/azure/role-based-access-control/quickstart-assign-role-user-portal). + +## Manage cluster access (Data plane) + +This access enables you to do the following actions: +* View clusters and manage jobs. +* All the monitoring and management operations. +* To enable auto scale and update the node count. + +The access is restricted for: +* Cluster deletion. + +To assign permission to users, groups, user-assigned managed identity, and service principals to enable access to clusterΓÇÖs data plane, the following options are available: + + * [Azure portal](#using-azure-portal) + * [ARM template](#using-arm-template) + +### Using Azure portal + +#### How to grant access + +The following steps describe how to provide access to other users, groups, user-assigned managed identity, and service principals. + +1. Navigate to the Cluster access blade of your cluster in the Azure portal and click Add. + + :::image type="content" source="./media/hdinsight-on-aks-manage-authorization-profile/cluster-access.png" alt-text="Screenshot showing how to provide access to a user for cluster access."::: + +1. Search for the user/group/user-assigned managed identity/service principal to grant access and click Add. + + :::image type="content" source="./media/hdinsight-on-aks-manage-authorization-profile/add-members.png" alt-text="Screenshot showing how to add member for cluster access."::: + +#### How to remove access + +1. Select the members to be removed and click Remove. + + :::image type="content" source="./media/hdinsight-on-aks-manage-authorization-profile/remove-access.png" alt-text="Screenshot showing how to remove cluster access for a member."::: + +### Using ARM template + +#### Prerequisites + +* An operational HDInsight on AKS cluster. +* [ARM template](./create-cluster-using-arm-template-script.md) for your cluster. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +Follow the steps to update `authorizationProfile` object under `clusterProfile` section in your cluster ARM template. + +1. In the Azure portal search bar, search for user/group/user-assigned managed identity/service principal. + + :::image type="content" source="./media/hdinsight-on-aks-manage-authorization-profile/search-object-id.png" alt-text="Screenshot showing how to search object ID."::: + +1. Copy the Object ID or Principal ID. + + :::image type="content" source="./media/hdinsight-on-aks-manage-authorization-profile/view-object-id.png" alt-text="Screenshot showing how to view object ID."::: + +1. Modify the `authorizationProfile` section in your cluster ARM template. + + 1. Add user/user-assigned managed identity/service principal Object ID or Principal ID under `userIds` property. + + 1. Add groups Object ID under `groupIds` property. + + ```json + "authorizationProfile": { + "userIds": [ + "abcde-12345-fghij-67890", + "a1b1c1-12345-abcdefgh-12345" + ], + "groupIds": [] + }, + ``` + + 1. Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal).
hdinsight-aks	How To Azure Monitor Integration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/how-to-azure-monitor-integration.md	+ + Title: How to integrate with Azure Monitor +description: Learn how to integrate with Azure Monitoring. ++ Last updated : 08/29/2023++ +# How to integrate with Log Analytics ++ +This article describes how to enable Log Analytics to monitor & collect logs for cluster pool and cluster operations on HDInsight on AKS. You can enable the integration during cluster pool creation or post the creation. +Once the integration at cluster pool is enabled, it isn't possible to disable the integration. However, you can disable the log analytics for individual clusters, which are part of the same pool. + +## Prerequisites + +* Log Analytics workspace. You can think of this workspace as a unique logs environment with its own data repository, data sources, and solutions. Learn how to [create a Log Analytics workspace](/azure/azure-monitor/logs/quick-create-workspace?tabs=azure-portal) . + + > [!NOTE] + > 1. Log Analytics must be enabled at cluster pool level first so as to enable it at a cluster level. + > + > 2. The configuration at cluster pool level is a global switch for all clusters in the cluster pool, therefore all clusters in the same cluster pool can only flow log to one Log Analytics workspace. + +## Enable Log Analytics using the portal during cluster pool creation + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. Select Create a resource and search for cluster pool in marketplace and select Azure HDInsight on AKS Cluster Pool. For more information on starting the cluster pool creation process, see [Create a cluster pool](quickstart-create-cluster.md#create-a-cluster-pool). + +1. Navigate to the Integrations blade, select Enable Log Analytics. + + :::image type="content" source="./media/how-to-azure-monitor-integration/enable-log-analytics.png" alt-text="Screenshot showing how to enable log analytics option." border="true" lightbox="./media/how-to-azure-monitor-integration/enable-log-analytics.png"::: + +1. From the drop-down list, select an existing Log Analytics workspace. Complete the remaining details required to finish cluster pool creation and select Create. + +1. Log Analytics is enabled when the cluster pool is successfully created. All monitoring capabilities can be accessed under your cluster poolΓÇÖs Monitoring section. + + :::image type="content" source="./media/how-to-azure-monitor-integration/monitor-section.png" alt-text="Screenshot showing monitoring section in the Azure portal." lightbox="./media/how-to-azure-monitor-integration/monitor-section.png"::: + +## Enable Log Analytics using portal after cluster pool creation + +1. In the Azure portal search bar, type "HDInsight on AKS cluster pools" and select Azure HDInsight on AKS cluster pools to go to the cluster pools page. On the HDInsight on AKS cluster pools page, select your cluster pool. + + :::image type="content" source="./media/how-to-azure-monitor-integration/cluster-pool-get-started.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster pool." border="true" lightbox="./media/how-to-azure-monitor-integration/cluster-pool-get-started.png"::: + + :::image type="content" source="./media/how-to-azure-monitor-integration/cluster-pool-in-list-view.png" alt-text="Screenshot showing cluster pools in a list view." border="true" lightbox="./media/how-to-azure-monitor-integration/cluster-pool-in-list-view.png"::: + +1. Navigate to the "Monitor settings" blade on the left side menu and click on "Configure" to enable Log Analytics. + + :::image type="content" source="./media/how-to-azure-monitor-integration/cluster-pool-integration.png" alt-text="Screenshot showing cluster pool integration blade." border="true" lightbox="./media/how-to-azure-monitor-integration/cluster-pool-integration.png"::: + +1. Select an existing Log Analytics workspace, and click Ok. + + :::image type="content" source="./media/how-to-azure-monitor-integration/enable-cluster-pool-log-analytics.png" alt-text="Screenshot showing how to enable cluster pool log analytics." border="true" lightbox="./media/how-to-azure-monitor-integration/enable-cluster-pool-log-analytics.png"::: + +## Enable Log Analytics using the portal during cluster creation + +1. In the Azure portal search bar, type "HDInsight on AKS cluster pools" and select Azure HDInsight on AKS cluster pools to go to the cluster pools page. On the HDInsight on AKS cluster pools page, select the cluster pool in which you want to create a cluster. + + :::image type="content" source="./media/how-to-azure-monitor-integration/cluster-pool-get-started.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster pool." border="true" lightbox="./media/how-to-azure-monitor-integration/cluster-pool-get-started.png"::: + + :::image type="content" source="./media/how-to-azure-monitor-integration/cluster-pool-in-list-view.png" alt-text="Screenshot showing cluster pools in a list view." border="true" lightbox="./media/how-to-azure-monitor-integration/cluster-pool-in-list-view.png"::: + + > [!NOTE] + > It is important to make sure that the selected cluster pool has Log Analytics enabled. + +1. Select New Cluster to start the creation process. For more information on starting the cluster creation process, see [Create a cluster](./quickstart-create-cluster.md). + + :::image type="content" source="./media/how-to-azure-monitor-integration/new-cluster.png" alt-text="Screenshot showing New cluster button in the Azure portal." border="true" lightbox="./media/how-to-azure-monitor-integration/new-cluster.png"::: + +1. Navigate to the Integrations blade, select Enable Log Analytics. + +1. Select one or more type of logs you would like to collect. Complete the remaining details required to finish the cluster creation and select Create. + + :::image type="content" source="./media/how-to-azure-monitor-integration/select-log-type.png" alt-text="Screenshot showing how to select log type." border="true" lightbox="./media/how-to-azure-monitor-integration/select-log-type.png"::: + + > [!NOTE] + > If no option is selected, then only AKS service logs will be available. + +2. Log Analytics is enabled when the cluster is successfully created. All monitoring capabilities can be accessed under your cluster's Monitoring section. + + :::image type="content" source="./media/how-to-azure-monitor-integration/monitor-section-cluster.png" alt-text="Screenshot showing Monitoring section for cluster in the Azure portal."::: + +## Enable Log Analytics using portal after cluster creation + +1. In the Azure portal top search bar, type "HDInsight on AKS clusters" and select Azure HDInsight on AKS clusters from the drop-down list. On the HDInsight on AKS cluster pools page, select your cluster name from the list page. + + :::image type="content" source="./media/how-to-azure-monitor-integration/get-started-portal-search-step-1.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster." lightbox="./media/how-to-azure-monitor-integration/get-started-portal-search-step-1.png"::: + + :::image type="content" source="./media/how-to-azure-monitor-integration/get-started-portal-list-view-step-2.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list." border="true" lightbox="./media/how-to-azure-monitor-integration/get-started-portal-list-view-step-2.png"::: + +1. Navigate to the "Monitor settings" blade, select Enable Log Analytics. Choose one or more type of logs you would like to collect, and click Save. + + :::image type="content" source="./media/how-to-azure-monitor-integration/select-more-log-types.png" alt-text="Screenshot showing how to select more log types." border="true" lightbox="./media/how-to-azure-monitor-integration/select-more-log-types.png"::: + + > [!NOTE] + > If no option is selected, then only AKS service logs will be available. + +## Access the log tables and run queries using the portal + +1. From the Azure portal, select your cluster pool or cluster of choice to open it. + +1. Navigate to the Monitoring section and select the Logs blade to query and analyze the collected data. + + :::image type="content" source="./media/how-to-azure-monitor-integration/monitoring-logs.png" alt-text="Screenshot showing logs in the Azure portal."::: + +1. A list of commonly used query templates is provided to choose from to simplify the process or you can write your own query using the provided console. + + :::image type="content" source="./media/how-to-azure-monitor-integration/queries.png" alt-text="Screenshot showing queries in the Azure portal." border="true" lightbox="./media/how-to-azure-monitor-integration/queries.png"::: + + :::image type="content" source="./media/how-to-azure-monitor-integration/new-query.png" alt-text="Screenshot showing New queries in the Azure portal." border="true" lightbox="./media/how-to-azure-monitor-integration/new-query.png"::: +
hdinsight-aks	Manage Cluster Pool	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/manage-cluster-pool.md	+ + Title: Manage cluster pools +description: Manage cluster pools in HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Manage cluster pools ++ +Cluster pools are a logical grouping of clusters and maintain a set of clusters in the same pool. It helps in building robust interoperability across multiple cluster types and allow enterprises to have the clusters in the same virtual network. One cluster pool corresponds to one cluster in AKS infrastructure. + +This article describes how to manage a cluster pool. + +> [!NOTE] +> You are required to have an operational cluster pool, Learn how toΓÇ»create a [cluster pool](./quickstart-create-cluster.md). + +## Get started + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster pool" and select "Azure HDInsight on AKS cluster pools" from the drop-down list. + + :::image type="content" source="./media/manage-cluster-pool/cluster-pool-get-started.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster pool." border="true" lightbox="./media/manage-cluster-pool/cluster-pool-get-started.png"::: + +1. Select your cluster pool name from the list page. + + :::image type="content" source="./media/manage-cluster-pool/cluster-pool-in-list-view.png" alt-text="Screenshot showing cluster pools in a list view." border="true" lightbox="./media/manage-cluster-pool/cluster-pool-in-list-view.png"::: + +## Create new cluster + +In a cluster pool, you can add multiple clusters of different types. For example, you can have a Trino cluster and an Apache Flink cluster inside the same pool. + +To create a new cluster, click on the +New cluster on the Azure portal and continue to use the Azure portal to create a Trino, Apache Flink, and Apache Spark cluster. + +Learn more on how to [create a cluster](./quickstart-create-cluster.md). ++ +## View the list of existing clusters + +You can view the list of clusters in the cluster pool on theΓÇ»OverviewΓÇ»tab. + + +## Manage access to the cluster pool + +HDInsight on AKS supports both Azure built-in roles and certain roles specific to HDInsight on AKS. In the Azure portal, you can use Access control (IAM) blade in your pool to manage the access for cluster poolΓÇÖs control plane. + +For more information, see [manage access](./hdinsight-on-aks-manage-authorization-profile.md). + +## Enable integration with Azure services + + In the Azure portal, use Monitor settings blade in your cluster pool to configure the supported Azure services. Currently, we support Log Analytics and Azure managed Prometheus and Grafana, which has to be configured at cluster pool before you can enable at cluster level. + + * Learn more about [Azure Monitor Integration](./how-to-azure-monitor-integration.md). + * For more information, see [how to enable Log Analytics](./how-to-azure-monitor-integration.md). + * For more information, see [how to enable Azure managed Prometheus and Grafana](./monitor-with-prometheus-grafana.md). + + +## Delete cluster pool + + Deleting the cluster pool deletes the following resources: + + * All the clusters that are part of the cluster pool. + * Managed resource groups created during cluster pool creation to hold the ancillary resources. + + However, it doesn't delete the external resources associated with the cluster pool or cluster. For example, Key Vault, Storage account, Monitoring workspace etc. + + Each cluster pool version is associated with an AKS version. When an AKS version is deprecated, you'll be notified. In this case, you need to delete the cluster pool and recreate to move to the supported AKS version ++ + > [!Note] + > You can't recover the deleted cluster pool. Be careful while deleting the cluster pool. + + 1. To delete the cluster pool, click on "Delete" at the top left in the "Overview" blade in the Azure portal. + + :::image type="content" source="./media/manage-cluster-pool/delete-cluster-pool.png" alt-text="Screenshot showing how to delete cluster pool."::: + + 1. Enter the pool name to be deleted and click on delete. + + :::image type="content" source="./media/manage-cluster-pool/cluster-pool-delete-cluster.png" alt-text="Screenshot showing how to delete cluster pool, and updating your cluster pool name once you click delete."::: + + Once the deletion is successful, you can check the status by clicking Notifications icon ![Screenshot showing the Notifications icon in the Azure portal.](./media/manage-cluster-pool/notifications.png) in the Azure portal. + + :::image type="content" source="./media/manage-cluster-pool/cluster-pool-delete-cluster-notification.png" alt-text="Screenshot showing a notification alert of a cluster pool deletion successful.":::
hdinsight-aks	Manage Cluster	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/manage-cluster.md	+ + Title: Manage clusters +description: Manage clusters in HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Manage clusters ++ +Clusters are individual compute workloads such as Apache Spark, Apache Flink, and Trino, which can be created rapidly in few minutes with preset configurations and few clicks. + +This article describes how to manage a cluster using Azure portal. + +> [!NOTE] +> You are required to have an operational cluster, Learn how toΓÇ»create a [cluster](./quickstart-create-cluster.md). + +## Get started + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS clusters" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/manage-cluster/get-started-portal-search-step-1.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster." border="true" lightbox="./media/manage-cluster/get-started-portal-search-step-1.png"::: + +1. Select your cluster name from the list page. + + :::image type="content" source="./media/manage-cluster/get-started-portal-list-view-step-2.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list." border="true" lightbox="./media/manage-cluster/get-started-portal-list-view-step-2.png"::: + +## View cluster details + +You can view the cluster details in the "Overview" blade of your cluster. It provides general information and easy access to the tools that are part of the cluster. + +\|Property\|Description\| +\|-\|-\| +\|Resource group\| The resource group in which cluster is created.\| +\|Cluster pool name\| Cluster pool name inside which the cluster is created.\| +\|Cluster type\| The type of the cluster such as Spark, Trino, or Flink.\| +\|HDInsight on AKS version\| HDInsight on AKS cluster version. For more information. see [versioning](./versions.md).\| +\|Cluster endpoint\| The endpoint of the cluster.\| +\|Cluster package\| Component versions associated with the cluster.\| +\|Subscription details\| Subscription name and subscription ID.\| +\|Location\| Region in which the cluster is deployed.\| +\|Cluster size details\| Node size, node type, and number of nodes.\| ++ +## Manage cluster size + +You can check and modify the number of worker nodes for your cluster using "Cluster size" blade in the Azure portal. There are two options to scale up/down your cluster: + +* [Manual scale](./manual-scale.md) +* [Auto scale](./hdinsight-on-aks-autoscale-clusters.md) ++ +## Manage cluster access + +HDInsight on AKS provides a comprehensive and fine-grained access control at both control plane and data plane, which allows you to manage cluster resources and provide access to cluster data plane. + +Learn how to [manage access to your cluster](./hdinsight-on-aks-manage-authorization-profile.md). + +## Configure secure shell (SSH) + +Secure shell (SSH) allows you to submit jobs and queries to your cluster directly. You can enable or disable SSH using "Secure shell (SSH)" blade in the Azure portal. + +>[!NOTE] +>Enabling SSH will create additional VMs in the cluster. The maximum allowed secure shell nodes are 5. ++ +## Manage cluster configuration + +HDInsight on AKS allows you to tweak the configuration properties to improve performance of your cluster with certain settings. For example, usage or memory settings. +In the Azure portal, use "Configuration management" blade of your cluster to manage the configurations. + +You can do the following actions: + +* Update the existing service configurations or add new configurations. +* Export the service configurations using RestAPI. + +Learn how to manage the [cluster configuration](./service-configuration.md). + +## View service details + +In the Azure portal, use "Services" blade in your cluster to check the health of the services running in your cluster. It includes the collection of the services and the status of each service running in the cluster. You can drill down on each service to check instance level details. + +Learn how to check [service health](./service-health.md). + +## Enable integration with Azure services + +In the Azure portal, use "Integrations" blade in your cluster pool to configure the supported Azure services. Currently, we support Log Analytics and Azure managed Prometheus and Grafana, which has to be configured at cluster pool before you can enable at cluster level. + +* Learn more about [Azure Monitor Integration](./how-to-azure-monitor-integration.md). +* For more information, see [how to enable Log Analytics](./how-to-azure-monitor-integration.md). +* For more information, see [how to enable Azure managed Prometheus and Grafana](./monitor-with-prometheus-grafana.md). ++ +## Delete cluster + +Deleting a cluster doesn't delete the default storage account nor any linked storage accounts. You can re-create the cluster by using the same storage accounts and the same metastores. + +From the "Overview" blade in the Azure portal: + +1. Select Delete from the top menu. + + :::image type="content" source="./media/manage-cluster/delete-cluster-step-7.png" alt-text="Screenshot showing how to delete the cluster on HDInsight on AKS Cluster." border="true" ::: +1. Status can be checked on Notification icon ![Screenshot showing the Notifications icon in the Azure portal.](./media/manage-cluster/notifications.png).
hdinsight-aks	Manage Script Actions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/manage-script-actions.md	+ + Title: Manage script actions on Azure HDInsight on AKS clusters +description: An introduction on how to manage script actions in Azure HDInsight on AKS. ++ Last updated : 08/29/2023+ +# Script actions during cluster creation ++ +Azure HDInsight on AKS provides a mechanism calledΓÇ»Script ActionsΓÇ»that invoke custom scripts to customize the cluster. These scripts are used to install additional components and change configuration settings. Script actions can be provisioned only during cluster creation as of now. Post cluster creation, Script Actions are part of the roadmap. +This article explains how you can provision script actions when you create an HDInsight on AKS cluster. + +## Use a script action during cluster creation using Azure portal + +1. Upload the script action in a `ADLS/WASB` storage(does not have to be the primary cluster storage). In this example we will consider an `ADLS` storage. + To upload a script into your storage, navigate into the target storage and the container where you want to upload it. + + :::image type="content" source="./media/manage-script-actions/upload-script-action-1.png" alt-text="Screenshot showing the how to select container." border="true" lightbox="./media/manage-script-actions/upload-script-action-1.png"::: + +1. To upload a script into your storage, navigate into the target storage and the container. Click on the upload button and select the script from your local drive. + After the script gets uploaded you should be able to see it in the container(see below image). + + :::image type="content" source="./media/manage-script-actions/upload-script-action-2.png" alt-text="Screenshot showing how to upload the script." border="true" lightbox="./media/manage-script-actions/upload-script-action-2.png"::: + +1. Create a new cluster as described [here](./quickstart-create-cluster.md). + +1. From theΓÇ»ConfigurationΓÇ»tab, select + Add script action. + + :::image type="content" source="./media/manage-script-actions/manage-script-action-creation-step-1.png" alt-text="Screenshot showing the New cluster page with Add Script action button in the Azure portal." border="true" lightbox="./media/manage-script-actions/manage-script-action-creation-step-1.png"::: + + This action opens the Script Action window. Provide the following details: + + :::image type="content" source="./media/manage-script-actions/manage-script-action-add-step-2.png" alt-text="Screenshot showing the Add Script action window opens in the Azure portal."::: + + \|Property\|Description\| + \|-\|-\| + \|Script Action Name\| Unique name of the script action.\| + \|Bash Script URL\| Location where the script is stored. For example - `abfs://<CONTAINER>@<DATALAKESTOREACCOUNTNAME>.dfs.core.windows.net/<file_path>`, update the data lake storage name and file path.\| + \|Services\| Select the specific service components where the Script Action needs to run.\| + \|Parameters\| Specify the parameters, if necessary for the script.\| + \|`TimeOutInMinutes`\|Choose the timeout for each script\| + + :::image type="content" source="./media/manage-script-actions/manage-script-action-add-node-type-step-3.png" alt-text="Screenshot showing the list of services where to the apply the script actions." border="true" lightbox="./media/manage-script-actions/manage-script-action-add-node-type-step-3.png"::: + + > [!NOTE] + > * All the Script Actions will be persisted. + > * Script actions are available only for Apache Spark cluster type. + +1. SelectΓÇ»ΓÇÿOKΓÇÖΓÇ»to save the script. +1. Then you can again useΓÇ»+ Add Script Action to add another script if necessary. + + :::image type="content" source="./media/manage-script-actions/manage-script-action-view-scripts-step-4.png" alt-text="Screenshot showing the View scripts section in the integration tab." border="true" lightbox="./media/manage-script-actions/manage-script-action-view-scripts-step-4.png"::: + +1. Complete the remaining cluster creation steps to create a cluster. + + >[!Important] + >* There's no automatic way to undo the changes made by a script action. + >* Script actions must finish within 40 minutes, or they time out causing cluster creation to fail. + >* During cluster provisioning, the script runs concurrently with other setup and configuration processes. + >* Competition for resources such as CPU time or network bandwidth might cause the script to take longer to finish. + >* To minimize the time it takes to run the script, avoid tasks like downloading and compiling applications from the source. Precompile applications and store the binary in Azure Data Lake Store Gen2. + +### View the list of Script Actions + +1. You can view the list of Script Actions in the "Configuration" tab. + + :::image type="content" source="./media/manage-script-actions/manage-script-action-view-scripts-step-5.png" alt-text="Screenshot showing the Create to save Script actions page." border="true" lightbox="./media/manage-script-actions/manage-script-action-view-scripts-step-5.png"::: + +
hdinsight-aks	Manual Scale	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/manual-scale.md	+ + Title: Manual scale +description: How to manually scale in HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Manual scale ++ +HDInsight on AKS provides elasticity with options to scale up and scale down the number of cluster nodes. This elasticity works to help increase resource utilization and improve cost efficiency. + +## Utility to scale clusters + +HDInsight on AKS provides the following methods to manually scale clusters: + +\| Utility\| Description\| +\|\|\| +\|Azure portal\| Open your HDInsight on AKS cluster pane, select Cluster size on the left-hand menu, then on the Cluster size pane, type in the number of worker nodes, and select Save \| +\|REST API\|To scale a running HDInsight on AKS cluster using the REST API, make a subsequent POST request on the same resource with the updated count in the compute profile.\| + +You can use the Azure portal to access the ΓÇ£Cluster sizeΓÇ¥ menu in the cluster navigation page. In Cluster size blade, change the ΓÇ£Number of worker nodes,ΓÇ¥ and save the change to scale up or down the cluster. ++ +## Impact of scaling operation on a cluster + +Any scaling operation triggers a restart of the service, which can lead to errors on jobs already running. + +When you┬áadd nodes to an operational HDInsight on AKS cluster (scale up): + +- Successful scaling operation using manual scale will add worker nodes to the cluster. +- New jobs can be safely submitted when the scaling process is completed. +- If the scaling operation fails, the failure leaves your cluster in the "FailedΓÇ¥ state. +- You can expect to experience job failures during the scaling operation as services get restarted. + +If you┬áremove nodes┬á(scale down) from an HDInsight on AKS cluster: + +- Pending or running jobs fails when the scaling operation completes. This failure is because of some of the services restarting during the scaling process. The impact of changing the number of cluster nodes varies for each cluster type. + +>[!IMPORTANT] +>- To avoid quota errors during scaling operations, please plan for quota in your subscription. In case you have insufficient quota, you can increase quota with this [documentation](/azure/quotas/regional-quota-requests). +>- In case scale down selects a head node, which hosts coordinator/ingress and other services, it will result in downtime. + +## Frequently Asked Questions + +### General + +\|Question\|Answer\| +\| -- \| -- \| +\|What are the minimum nodes that I can add/remove during scale operations?\|One Node.\| +\|What's the maximum limit to scale up an HDInsight on AKS Trino cluster?\|100 nodes (in public preview).\| +\|How do I manually scale down my cluster?\|In the ARM request, update `computeProfile.count` or follow the steps mentioned to scale down using Azure portal.\| +\|Can I add custom script actions to a cluster during manual scale?\|Script actions are applicable for Apache Spark cluster type\| +\|How do I get logs for manual scale failures for the cluster nodes?\|Logs are available in Log analytics module, refer the [Azure Monitor Integration](./how-to-azure-monitor-integration.md).\| +\|Is load based or schedule based autoscaling supported?\|Yes. For more information, see [Autoscale](./hdinsight-on-aks-autoscale-clusters.md).\| + +### Trino + +\|Question\|Answer\| +\| -- \| -- \| +\|Will my Trino service restart after scaling operation?\|Yes, service restarts during the scaling operation.\| + +### Apache Flink + +\|Question\|Answer\| +\| -- \| -- \| +\|WhatΓÇÖs the impact of scaling operations on Apache Flink cluster?\|Any scaling operation is likely to trigger a restart of the service, which causes job failures. New jobs can be submitted when the scaling process is completed. In Apache Flink, scale down triggers job restarts and scale up operation canΓÇÖt trigger job restarts.\| ++ +### Apache Spark + +\|Question\|Answer\| +\| -- \| -- \| +\|WhatΓÇÖs the impact of scaling operations on Spark cluster?\|Manual scale down operation may trigger restart of head nodes services.\| ++ +> [!NOTE] +> It is recommended that you manage the quotas set on the subscription prior to scaling operations to avoid quota errors. +> Before scaling down, please note that for an HDInsight on AKS Trino cluster to be operational, it requires minimum five active nodes. +
hdinsight-aks	Monitor With Prometheus Grafana	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/monitor-with-prometheus-grafana.md	+ + Title: Monitoring with Azure Managed Prometheus and Grafana +description: Learn how to use monitor With Azure Managed Prometheus and Grafana ++ Last updated : 08/29/2023++ +# Monitoring with Azure Managed Prometheus and Grafana ++ +Cluster and service Monitoring is integral part of any organization. Azure HDInsight on AKS comes with integrated monitoring experience with Azure services. In this article, we use managed Prometheus service with Azure Grafana dashboards for monitoring. + +[Azure Managed Prometheus](../azure-monitor/essentials/prometheus-metrics-overview.md) is a service that monitors your cloud environments. The monitoring is to maintain their availability and performance and workload metrics. It collects data generated by resources in your Azure instances and from other monitoring tools. The data is used to provide analysis across multiple sources. + +[Azure Managed Grafana](../managed-grafan) is a data visualization platform built on top of the Grafana software by Grafana Labs. It's built as a fully managed Azure service operated and supported by Microsoft. Grafana helps you bring together metrics, logs, and traces into a single user interface. With its extensive support for data sources and graphing capabilities, you can view and analyze your application and infrastructure telemetry data in real-time. + +This article covers the details of enabling the monitoring feature in HDInsight on AKS. + +## Prerequisites + +* An Azure Managed Prometheus workspace. You can think of this workspace as a unique Azure Monitor logs environment with its own data repository, data sources, and solutions. For the instructions, see [Create a Azure Managed Prometheus workspace](../azure-monitor/essentials/azure-monitor-workspace-manage.md). +* Azure Managed Grafana workspace. For the instructions, see [Create a Azure Managed Grafana workspace](../managed-grafan). +* An [HDInsight on AKS cluster](./quickstart-create-cluster.md). Currently, you can use Azure Managed Prometheus with the following HDInsight on AKS cluster types: + * Apache Spark + * Apache Flink + * Trino + +For the instructions on how to create an HDInsight on AKS cluster, see [Get started with Azure HDInsight on AKS](./overview.md). + +## Enabling Azure Managed Prometheus and Grafana + +The Azure Managed Prometheus and Grafana Monitoring must be configured at cluster pool level to enable it at cluster level. You need to consider various stages while enabling the Monitoring Solution. ++ +\|#\| Scenario \|Enable \|Disable \| +\|-\|-\|-\|-\| +\|1\| Cluster Pool -During Creation \| `Not Supported` \|`Default` \| +\|2\| Cluster Pool ΓÇô Post Creation \| `Supported` \| `Not Supported` \| +\|3\| Cluster ΓÇô During Creation \| `Supported` \| `Default` \| +\|4\| Cluster ΓÇô Post Creation \|`Supported` \|`Supported` \| + +## During cluster pool creation + +Currently, Managed Prometheus CANNOT be enabled during Cluster Pool creation time. You can configure it post cluster pool creation. + +## Post cluster pool creation + +Monitoring can be enabled from the Integrations tab on an existing Cluster Pool View available in Azure portal. +You can use pre created workspaces or create a new one while your'e configuring the monitoring for the cluster pool. + +### Use precreated workspace + +1. Click on configure to enable Azure Prometheus monitoring. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/integration-configure-tab.png" alt-text="Screenshot showing integration configure tab." border="true" lightbox="./media/monitor-with-prometheus-grafana/integration-configure-tab.png"::: + +1. Click on Advanced Settings to attach your pre created workspaces. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/advanced-settings.png" alt-text="Screenshot showing advanced settings." border="true" lightbox="./media/monitor-with-prometheus-grafana/advanced-settings.png"::: + + :::image type="content" source="./media/monitor-with-prometheus-grafana/configure-prometheus-step-1.png" alt-text="Screenshot showing configure Prometheus step 1." border="true" lightbox="./media/monitor-with-prometheus-grafana/configure-prometheus-step-1.png"::: + +### Create Azure Prometheus and Grafana Workspace while enabling Monitoring in Cluster Pool + +You can create the workspaces from the HDI on AKS cluster pool page. + +1. Click on Configure next to the Azure Prometheus option. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/configure-prometheus-step-2.png" alt-text="Screenshot showing configure Prometheus step 2." border="true" lightbox="./media/monitor-with-prometheus-grafana/configure-prometheus-step-2.png"::: + +1. Click on Create New workspace for Azure Managed Prometheus. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/configure-prometheus-step-3.png" alt-text="Screenshot showing configure Prometheus step 3." border="true" lightbox="./media/monitor-with-prometheus-grafana/configure-prometheus-step-3.png"::: + +1. Fill in the name, region and click on Create for Prometheus. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/configure-prometheus-step-4.png" alt-text="Screenshot showing configure Prometheus step 4." border="true" lightbox="./media/monitor-with-prometheus-grafana/configure-prometheus-step-4.png"::: + +1. Click on Create New workspace for Azure Managed Grafana. +1. Fill in Name, Region and click on Create for Grafana. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/configure-prometheus-step-5.png" alt-text="Screenshot showing configure Prometheus step 5." border="true" lightbox="./media/monitor-with-prometheus-grafana/configure-prometheus-step-5.png"::: ++ + > [!NOTE] + > 1. Managed Grafana can be enabled only if Managed Prometheus is enabled. + > 1. Once Azure Managed Prometheus workspace and Azure Managed Grafana workspace is enabled from the HDInsight on AKS cluster pool, it cannot be disabled from the cluster pool again. It must be disabled from the cluster level. + +## During cluster creation + +### Enable Azure Managed Prometheus during cluster creation + +1. Once the cluster pool is created and the Azure Managed Prometheus enabled, user must [create a HDI on AKS cluster in the same cluster pool](./trino/trino-create-cluster.md). + +1. During the cluster creation process, navigate to the Integration page and enable Azure Prometheus. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/enable-prometheus-monitoring.png" alt-text="Screenshot showing enable prometheus monitoring." border="true" lightbox="./media/monitor-with-prometheus-grafana/enable-prometheus-monitoring.png"::: + +## Post cluster creation + +You can also enable Azure Managed Prometheus post HDI on AKS cluster creation + +1. Navigate to the Integrations tab in the cluster page. + +1. Enable Azure Prometheus Monitoring with the toggle button and click on Save. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/save-configuration.png" alt-text="Screenshot showing how to save configuration." border="true" lightbox="./media/monitor-with-prometheus-grafana/save-configuration.png"::: + + > [!NOTE] + > Similarly, if you need to disable Azure Prometheus monitoring can be done by disabling the toggle button and click on Save. + +### Enabling required permissions + +For viewing Azure Managed Prometheus and Azure Managed Grafana from the HDInsight on AKS portal, you need to have certain permissions as follows. + +User permission: For viewing Azure Managed Grafana, ΓÇ£Grafana ViewerΓÇ¥ role is required for the user in the Azure Managed Grafana workspace, Access control (IAM). View how to grant user access, [here](../role-based-access-control/quickstart-assign-role-user-portal.md). + +1. Open the Grafana workspace configured in the cluster pool. +1. Select the Role as Grafana Viewer +1. Select the username who is accessing the Grafana dashboard. +1. Select the user and click on Review+ Assign + + > [!NOTE] + > If user is pre-creating Azure Managed Prometheus the Grafana Identity requires additional permission of Monitoring Reader. + +1. In the Grafana workspace page (the one linked to the cluster) provides Monitoring reader permission in Identity tab. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/role-assignment.png" alt-text="Screenshot showing how to assign role." border="true" lightbox="./media/monitor-with-prometheus-grafana/role-assignment.png"::: + +1. Click on Add role assignment. +1. Select the following parameters + 1. Scope as Subscription + 1. The subscription name. + 1. Role as Monitoring Reader + + + :::image type="content" source="./media/monitor-with-prometheus-grafana/role-assignment.png" alt-text="Screenshot showing how to assign role." border="true" lightbox="./media/monitor-with-prometheus-grafana/role-assignment.png"::: + + > [!NOTE] + > For viewing other roles for Grafana users see [here](../managed-grafan). + +## View metrics +You can use the Grafana dashboard to view the service and system. Trino cluster as an example, assuming few jobs are executed in the cluster. + +1. Open the Grafana link in the cluster overview page. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/view-metrics.png" alt-text="Screenshot showing how to view-metrics." border="true" lightbox="./media/monitor-with-prometheus-grafana/view-metrics.png"::: + +1. The default value on the Explore tab is Grafana. +1. Select on the dropdown and click on the `Managed Prometheus.ΓÇª. <workspace name>` option and select the parameters of the time frame required. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/set-time-frame.png" alt-text="Screenshot showing how to set time frame." border="true" lightbox="./media/monitor-with-prometheus-grafana/set-time-frame.png"::: + +1. Next Select the metric you want to see. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/metric-type.png" alt-text="Screenshot showing how to metric type." border="true" lightbox="./media/monitor-with-prometheus-grafana/metric-type.png"::: + +1. Click on Run Query and select the timeframe on how often the query should be run. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/run-query.png" alt-text="Screenshot showing how to run query." border="true" lightbox="./media/monitor-with-prometheus-grafana/run-query.png"::: + +1. View the metric as per selection. + + :::image type="content" source="./media/monitor-with-prometheus-grafana/view-output.png" alt-text="Screenshot showing how to view the output." border="true" lightbox="./media/monitor-with-prometheus-grafana/view-output.png":::
hdinsight-aks	Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/overview.md	+ + Title: What is Azure HDInsight on AKS? (Preview) +description: An introduction to Azure HDInsight on AKS. +++ Last updated : 08/29/2023++ +# What is HDInsight on AKS? (Preview) + + +HDInsight on AKS is a modern, reliable, secure, and fully managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows you to deploy popular Open-Source Analytics workloads like Apache Spark, Apache Flink, and Trino without the overhead of managing and monitoring containers. +You can build end-to-end, petabyte-scale Big Data applications spanning streaming through Apache Flink, data engineering and machine learning using Apache Spark, and Trino's powerful query engine. + +All these capabilities combined with HDInsight on AKSΓÇÖs strong developer focus enables enterprises and digital natives with deep technical expertise to build and operate applications that are right fit for their needs. HDInsight on AKS allows developers to access all the rich configurations provided by open-source software and the extensibility to seamlessly include other ecosystem offerings. This offering empowers developers to test and tune their applications to extract the best performance at optimal cost. + +HDInsight on AKS integrates with the entire Azure ecosystem, shortening implementation cycles and improving time to realize value. + + + ## Technical architecture + + HDInsight on AKS introduces the concept of cluster pools and clusters, which allow you to realize the complete value of data lakehouse. Cluster pools allow you to use multiple compute workloads on a single data lake, thereby removing the overhead of network management and resource planning. + +* Cluster pools are a logical grouping of clusters that help build robust interoperability across multiple cluster types and allow enterprises to have the clusters in the same virtual network. Cluster pools provide rapid and cost-effective access to all the cluster types created on-demand and at scale. +<br>One cluster pool corresponds to one cluster in AKS infrastructure. +* Clusters are individual compute workloads, such as Apache Spark, Apache Flink, and Trino, that can be created rapidly in few minutes with preset configurations. + +You can create the pool with a single cluster or a combination of cluster types, which are based on the need and can custom configure the following options: + +* Storage +* Network +* Logging +* Monitoring + +The following diagram shows the logical technical architecture of components installed in a default cluster pool. The clusters are isolated using [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) in AKS clusters. ++ +## Modernized cloud-native compute platform + +The latest version of HDInsight is orchestrated using AKS, which enables the platform to be more robust and empowers the users to handle the clusters effectively. Provisioning of clusters on HDInsight on AKS is fast and reliable, making it easy to manage clusters and perform in-place upgrades. With vast SKU choices and flexible subscription models, modernizing data lakehouses using open-source, cloud-native, and scalable infrastructure on HDInsight on AKS can meet all your analytics needs. + + +Key features include: +* Fast cluster creation and scaling. +* Ease of maintenance and periodic security updates. +* Cluster resiliency powered by modern cloud-native AKS. +* Native support for modern auth with OAuth, and Microsoft Entra ID (Azure Active Directory). +* Deep integration with Azure Services ΓÇô Azure Data Factory (ADF), Power BI, Azure Monitor. + +## Connectivity to HDInsight + +HDInsight on AKS can connect seamlessly with HDInsight. You can reap the benefits of using needed cluster types in a hybrid model. Interoperate with cluster types of HDInsight using the same storage and metastore across both the offerings. ++ +The following scenarios are supported: + +* [Flink connecting to HBase](./flink/use-flink-to-sink-kafka-message-into-hbase.md) +* [Flink connecting to Kafka](./flink/join-stream-kafka-table-filesystem.md) +* Spark connecting to HBase +* Spark connecting to Kafka + +## Security architecture + +HDInsight on AKS is secure by default. It enables enterprises to protect enterprise data assets with Azure Virtual Network, encryption, and integration with Microsoft Entra ID (Azure Active Directory). It also meets the most popular industry and government compliance standards upholding the Azure standards. With over 30 certifications that help protect data along with periodic updates, health advisor notifications, service health analytics, along with best-in-class Azure security standards. HDInsight on AKS offers several methods to address your enterprise security needs by default. +For more information, see [HDInsight on AKS security](./concept-security.md). + + +## Region availability (public preview) + +* West Europe +* Central India +* UK South +* Korea Central +* East US 2 +* West US 2 +* West US 3 +* East US
hdinsight-aks	Prerequisites Resources	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/prerequisites-resources.md	+ + Title: Resource prerequisites for Azure HDInsight on AKS +description: Prerequisite steps to complete for Azure resources before working with HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Resource prerequisites ++ +This article details the resources required for getting started with HDInsight on AKS. It covers the necessary and the optional resources and how to create them. + +## Necessary resources +The following table depicts the necessary resources that are required for cluster creation based on the cluster types. + +\|Workload\|Managed Service Identity (MSI)\|Storage\|SQL Server - SQL Database\|Key Vault\| +\| \|\|\|\|\| +\|Trino\| Γ£à \| \| \| \| +\|Flink\| Γ£à \| Γ£à \| \| \| +\|Spark\| Γ£à \| Γ£à \| \| \| +\|Trino, Flink, or Spark with Hive Metastore (HMS)\| Γ£à \| Γ£à \| Γ£à \| Γ£à \| + +> [!NOTE] +> MSI is used as a security standard for authentication and authorization across resources, except SQL Database. The role assignment occurs prior to deployment to authorize MSI to storage and the secrets are stored in Key vault for SQL Database. Storage support is with ADLS Gen2, and is used as data store for the compute engines, and SQL Database is used for table management on Hive Metastore. + +## Optional resources + +* Virtual Network (VNet) and Subnet: [Create virtual network](/azure/virtual-network/quick-create-portal) +* Log Analytics Workspace: [Create Log Analytics workspace](/azure/azure-monitor/logs/quick-create-workspace?tabs=azure-portal) + +> [!NOTE] +> +> * HDInsight on AKS allows you to bring your own VNet and Subnet, enabling you to customize your [network requirements](./secure-traffic-by-firewall.md) to suit the needs of your enterprise. +> * Log Analytics workspace is optional and needs to be created ahead in case you would like to use Azure Monitor capabilities like [Azure Log Analytics](./how-to-azure-monitor-integration.md). + +You can create the necessary resources in two ways: + +* [Ready-to-use ARM templates](#using-arm-templates) +* [Using Azure portal](#using-azure-portal) + +### Using ARM templates + +The following ARM templates allow you to create the specified necessary resources, in one click using a resource prefix and more details as required. + +For example, if you provide resource prefix as ΓÇ£demoΓÇ¥ then, following resources are created in your resource group depending on the template you select - +* MSI is created with name as `demoMSI`. +* Storage is created with name as `demostore` along with a container as `democontainer`. +* Key vault is created with name as `demoKeyVault` along with the secret provided as parameter in the template. +* Azure SQL database is created with name as `demoSqlDB` along with SQL server with name as `demoSqlServer`. + +\|Workload\|Prerequisites\| +\|\|\| +\|Trino\|Create the resources mentioned as follows: <br> 1. Managed Service Identity (MSI): user-assigned managed identity. <br><br> [![Deploy Trino to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FprerequisitesTrino.json)\| +\|Flink \|Create the resources mentioned as follows: <br> 1. Managed Service Identity (MSI): user-assigned managed identity. <br> 2. ADLS Gen2 storage account and a container. <br><br> Role assignments: <br> 1. Assigns ΓÇ£Storage Blob Data OwnerΓÇ¥ role to user-assigned MSI on storage account. <br><br> [![Deploy Apache Flink to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FprerequisitesFlink.json)\| +\|Spark\| Create the resources mentioned as follows: <br> 1. Managed Service Identity (MSI): user-assigned managed identity. <br> 2. ADLS Gen2 storage account and a container. <br><br> Role assignments: <br> 1. Assigns ΓÇ£Storage Blob Data OwnerΓÇ¥ role to user-assigned MSI on storage account. <br><br> [![Deploy Spark to Azure](https://aka.ms/deploytoazurebutton)]( https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2FprerequisitesSpark.json)\| +\|Trino, Flink, or Spark with Hive Metastore (HMS)\|Create the resources mentioned as follows: <br> 1. Managed Service Identity (MSI): user-assigned managed identity. <br> 2. ADLS Gen2 storage account and a container. <br> 3. Azure Key Vault and a secret to store SQL Server admin credentials. <br><br> Role assignments: <br> 1. Assigns ΓÇ£Storage Blob Data OwnerΓÇ¥ role to user-assigned MSI on storage account. <br> 2. Assigns ΓÇ£Key Vault Secrets UserΓÇ¥ role to user-assigned MSI on Key Vault. <br><br> [![Deploy Trino HMS to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-aks%2Fmain%2FARM%2520templates%2Fprerequisites_WithHMS.json)\| + +> [!NOTE] +> Using these ARM templates require a user to have permission to create new resources and assign roles to the resources in the subscription. + +### Using Azure portal + +#### [Create user-assigned managed identity (MSI)](/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp#create-a-user-assigned-managed-identity) + + A managed identity is an identity registered in Microsoft Entra ID [(Azure Active Directory)](https://www.microsoft.com/security/business/identity-access/azure-active-directory) whose credentials managed by Azure. With managed identities, you need not register service principals in Azure AD to maintain credentials such as certificates. + + HDInsight on AKS relies on user-assigned MSI for communication among different components. + +#### [Create storage account ΓÇô ADLS Gen 2](/azure/storage/blobs/create-data-lake-storage-account) + + The storage account is used as the default location for cluster logs and other outputs. + EnableΓÇ»hierarchical namespaceΓÇ»during the storage account creation to use as ADLS Gen2 storage. + + 1. [Assign a role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page): Assign ΓÇ£Storage Blob Data OwnerΓÇ¥ role to the user-assigned MSI created to this storage account. + + 1. [Create a container](/azure/storage/blobs/blob-containers-portal#create-a-container): After creating the storage account, create a container in the storage account. + + > [!NOTE] + > Option to create a container during cluster creation is also available. + +#### [Create Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart) + + Create an Azure SQL Database to be used as an external metastore during cluster creation or you can use an existing SQL Database. However, ensure the following properties are set. + + Necessary properties to be enabled for SQL Server and SQL Database- + + \|Resource Type\| Property\| Description\| + \|-\|-\|-\| + \|SQL Server \|Authentication method \|While creating a SQL Server, use "Authentication method" as <br> :::image type="content" source="./media/prerequisites-resources/authentication-method.png" alt-text="Screenshot showing how to select authentication method." border="true" lightbox="media/prerequisites-resources/authentication-method.png":::\| + \|SQL Database \|Allow Azure services and resources to access this server \|Enable this property under Networking blade in your SQL database in the Azure portal.\| + + > [!NOTE] + > * Currently, we support only Azure SQL Database as inbuilt metastore. + > * Due to Hive limitation, "-" (hyphen) character in metastore database name is not supported. + > * Azure SQL Database should be in the same region as your cluster. + > * Option to create a SQL Database during cluster creation is also available. However, you need to refresh the cluster creation page to get the newly created database appear in the dropdown list. + +#### [Create Azure Key Vault](/azure/key-vault/general/quick-create-portal#create-a-vault) + + Key Vault allows you to store the SQL Server admin password set during SQL Database creation. + HDInsight on AKS platform doesnΓÇÖt deal with the credential directly. Hence, it's necessary to store your important credentials in the Key Vault. + + 1. [Assign a role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page): Assign ΓÇ£Key Vault Secrets UserΓÇ¥ role to the user-assigned MSI created as part of necessary resources to this Key Vault. + + 1. [Create a secret](/azure/key-vault/secrets/quick-create-portal#add-a-secret-to-key-vault): This step allows you to keep your SQL Server admin password as a secret in Azure Key Vault. Add your password in the ΓÇ£ValueΓÇ¥ field while creating a secret. + + > [!NOTE] + > * Make sure to note the secret name, as this is required during cluster creation. + > * You need to have a ΓÇ£Key Vault AdministratorΓÇ¥ role assigned to your identity or account to add a secret in the Key Vault using Azure portal. Navigate to the Key Vault and follow the steps on [how to assign the role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page).
hdinsight-aks	Prerequisites Subscription	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/prerequisites-subscription.md	+ + Title: Subscription prerequisites for Azure HDInsight on AKS. +description: Prerequisite steps to complete on your subscription before working with Azure HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Subscription prerequisites ++ +If you're using Azure subscription first time for HDInsight on AKS, the following features might need to be enabled. + +## Enable features + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. Click the Cloud Shell icon (:::image type="icon" source="./media/prerequisites-subscription/cloud-shell.png" alt-text="Screenshot screenshot showing Cloud Shell icon.":::) at the top right, and select PowerShell or Bash as your environment depending on the command you use. + +At the next command prompt, enter each of the following commands: + +1. Register your subscription for 'AKS-AzureKeyVaultSecretsProvider' feature. + + ```azurecli + az feature register --name AKS-AzureKeyVaultSecretsProvider --namespace "Microsoft.ContainerService" --subscription <Your Subscription> + ``` + + ```powershell + Register-AzProviderFeature -FeatureName AKS-AzureKeyVaultSecretsProvider -ProviderNamespace Microsoft.ContainerService + ``` + + Output: All requests for this feature should be automatically approved. The state in the response should show as Registered. + <br>If you receive a response that the registration is still on-going (state in the response shows as "Registering"), wait for a few minutes. <br>Run the command again in few minutes and the state changes to "Registered" once feature registration is completed. + +1. Register your subscription for 'EnablePodIdentityPreview' feature. + + ```azurecli + az feature register --name EnablePodIdentityPreview --namespace "Microsoft.ContainerService" --subscription <Your Subscription> + ``` + + ```powershell + Register-AzProviderFeature -FeatureName EnablePodIdentityPreview -ProviderNamespace Microsoft.ContainerService + ``` + Output: The response indicates the registration is in progress (state in the response shows as "Registering"). It might take a few minutes to register the feature. + <br>Run the command again in few minutes and the state changes to "Registered" once feature registration is completed. + +1. Register your subscription for 'KubeletDisk' feature. + + ```azurecli + az feature register --name KubeletDisk --namespace "Microsoft.ContainerService" --subscription <Your Subscription> + ``` + + ```powershell + Register-AzProviderFeature -FeatureName KubeletDisk -ProviderNamespace Microsoft.ContainerService + ``` + + Output: The response indicates the registration is in progress (state in the response shows as "Registering"). It might take a few minutes to register the feature. + <br>Run the command again in few minutes and the state changes to "Registered" once feature registration is completed. + +1. Register with 'Microsoft.ContainerService' provider to propagate the features registered in the previous steps. + + ```azurecli + az provider register -n Microsoft.ContainerService --subscription <Your Subscription> + ``` + + ```powershell + Register-AzResourceProvider -ProviderNamespace┬áMicrosoft.ContainerService + ``` + + Output: No response means the feature registration propagated and you can proceed. If you receive a response that the registration is still on-going, wait for a few minutes, and run the command again until you receive no response. + +## Next steps +* [One-click deployment](./get-started.md)
hdinsight-aks	Preview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/preview.md	+ + Title: HDInsight on AKS preview information +description: This article explains what public preview mean in HDInsight on AKS. ++ Last updated : 09/05/2023++ +# Microsoft HDInsight on AKS preview information + +Azure HDInsight on AKS is currently in public preview and may be substantially modified before it's released. Preview online service products and features aren't complete but are made available on a preview basis so that customers can get early access and provide feedback. + +This article describes the Azure HDInsight on AKS preview state, and provides disclaimers related to preview. + +## Terms of use + +Your use of the Microsoft Azure HDInsight on AKS Cluster Pool or Microsoft Azure HDInsight on AKS Clusters preview experiences and features is governed by the preview online service terms and conditions of the agreement(s) under which you obtained the services and the [supplemental preview terms](https://go.microsoft.com/fwlink/?linkid=2240967). + +Previews are provided ΓÇ£as-is,ΓÇ¥ ΓÇ£with all faults,ΓÇ¥ and ΓÇ£as available,ΓÇ¥ and are excluded from the service level agreements and limited warranty. Customer support may not cover previews. We may change or discontinue Previews at any time without notice. We also may choose not to release a Preview into ΓÇ£General AvailabilityΓÇ¥. + +Previews may be subject to reduced or different security, compliance and privacy commitments, as further explained in the [Microsoft Privacy Statement](https://go.microsoft.com/fwlink/?LinkId=521839), [Microsoft Trust Center](https://go.microsoft.com/fwlink/?linkid=2179910), the [Product Terms](https://go.microsoft.com/fwlink/?linkid=2173816), the [Microsoft Products and Services Data Protection Addendum (ΓÇ£DPAΓÇ¥)](https://go.microsoft.com/fwlink/?linkid=2153219), and any extra notices provided with the Preview. + +## Functionality + +During preview, Microsoft Azure HDInsight on AKS may have limited or restricted functionality. + +## Availability + +During public preview, HDInsight on AKS may not be available in all geographic areas. For more information, see [region availability](./overview.md#region-availability-public-preview) ++
hdinsight-aks	Quickstart Create Cluster	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/quickstart-create-cluster.md	+ + Title: Create cluster pool and cluster +description: Creating a cluster pool and cluster in HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Create cluster pool and cluster ++ +HDInsight on AKS has the concept of cluster pools and clusters. + +- Cluster pools are a logical grouping of clusters and maintain a set of clusters in the same pool, which helps in building robust interoperability across multiple cluster types. It can be created within an existing virtual network or outside a virtual network. + + A cluster pool in HDInsight on AKS corresponds to one cluster in AKS infrastructure. + +- Clusters are individual compute workloads, such as Apache Spark, Apache Flink, or Trino, which can be created in the same cluster pool. + +For creating Apache Spark, Apache Flink, or Trino clusters, you need to first create a cluster pool. + +## Prerequisites + +Ensure that you have completed the [subscription prerequisites](prerequisites-subscription.md) and [resource prerequisites](prerequisites-resources.md) before creating a cluster pool. + +## Create a cluster pool + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster pool" and select "Azure HDInsight on AKS cluster pools" from the drop-down list. + + :::image type="content" source="./media/quickstart-create-cluster/search-bar.png" alt-text="Diagram showing search bar in Azure portal." border="true" lightbox="./media/quickstart-create-cluster/search-bar.png" + +1. Click + Create. + + :::image type="content" source="./media/quickstart-create-cluster/create-button.png" alt-text="Diagram showing create button." border="true" lightbox="./media/quickstart-create-cluster/create-button.png"::: + +1. In the Basics tab, enter the following information: + + :::image type="content" source="./media/quickstart-create-cluster/cluster-pool-basic-tab.png" alt-text="Diagram showing cluster pool creation basic tab." border="true" lightbox="./media/quickstart-create-cluster/cluster-pool-basic-tab.png"::: + + \|Property\|Description\| + \|\|\| + \|Subscription\| From the drop-down list, select the Azure subscription under which you want to create HDInsight on AKS cluster pool.\| + \|Resource group\|From the drop-down list, select an existing resource group, or select Create new.\| + \|Pool name\| Enter the name of the cluster pool to be created. Cluster pool name length can't be more than 26 characters. It must start with an alphabet, end with an alphanumeric character, and must only contain alphanumeric characters and hyphens.\| + \|Region\|From the drop-down list, select the region for the cluster pool. Check [region availability](./overview.md#region-availability-public-preview). For cluster pools in a virtual network, the region for the virtual network and the cluster pool must be same. \| + \|Cluster pool version\|From the drop-down list, select the HDInsight on AKS cluster pool version. \| + \|Virtual machine\|From the drop-down list, select the virtual machine size for the cluster pool based on your requirement.\| + \|Managed resource group\|(Optional) Provide a name for managed resource group. It holds ancillary resources created by HDInsight on AKS.\| + + + Select Next: Security + networking to continue. + +1. On the Security + networking page, provide the following information: + + :::image type="content" source="./media/quickstart-create-cluster/cluster-pool-security-tab.png" alt-text="Diagram showing cluster pool creation network and security tab." border="true" lightbox="./media/quickstart-create-cluster/cluster-pool-security-tab.png"::: + + \|Property\|Description\| + \|\|\| + \|Virtual network (VNet) \| From the drop-down list, select a virtual network, which is in the same region as the cluster pool.\| + \|Subnet \| From the drop-down list, select the name of the subnet that you plan to associate with the cluster pool.\| + + Select Next: Integrations to continue. + + +1. On the Integrations page, provide the following information: + + :::image type="content" source="./media/quickstart-create-cluster/create-cluster-pool-integration-tab.png" alt-text="Diagram showing cluster pool creation integration tab." border="true" lightbox="./media/quickstart-create-cluster/create-cluster-pool-integration-tab.png"::: + + \|Property\|Description\| + \|\|\| + \|Log Analytics\| (Optional) Select this option to enable Log analytics to view insights and logs directly in your cluster by sending metrics and logs to a Log Analytics Workspace.\| + \|Azure Prometheus\| You can enable this option after cluster pool creation is completed. \| + + Select Next: Tags to continue. + +1. On the Tags page, enter any tags (optional) youΓÇÖd like to assign to the cluster pool. + + :::image type="content" source="./media/quickstart-create-cluster/create-cluster-pool-tags-page.png" alt-text="Diagram showing cluster pool creation tags tab." border="true" lightbox="./media/quickstart-create-cluster/create-cluster-pool-tags-page.png"::: + + \| Property \| Description\| + \|\|\| + \|Name \| Enter a name (key) that help you identify resources based on settings that are relevant to your organization. For example, "Environment" to track the deployment environment for your resources.\| + \| Value \| Enter the value that helps to relate to the resources. For example, "Production" to identify the resources deployed to production.\| + \| Resource \| Select the applicable resource type.\| + + Select Next: Review + create to continue. + +1. On the Review + create page, look for the Validation succeeded message at the top of the page and then click Create. + + The Deployment is in process page is displayed while the cluster pool is being created, and the Your deployment is complete page is displayed once the cluster pool is fully deployed and ready for use. + + :::image type="content" source="./media/quickstart-create-cluster/create-cluster-pool-review-create-page.png" alt-text="Diagram showing cluster pool review and create tab." lightbox="./media/quickstart-create-cluster/create-cluster-review-create-page.png"::: + + If you navigate away from the page, you can check the status of the deployment by clicking Notifications icon. + + > [!TIP] + > For troubleshooting any deployment errors, you can refer this [page](./create-cluster-error-dictionary.md). + +## Create a cluster + +Once the cluster pool deployment completes, continue to use the Azure portal to create a [Trino](./trino/trino-create-cluster.md#create-a-trino-cluster), [Flink](./flink/flink-create-cluster-portal.md#create-an-apache-flink-cluster), and [Spark](./spark/hdinsight-on-aks-spark-overview.md) cluster. + +> [!IMPORTANT] +> For creating a cluster in a new cluster pool, assign AKS agentpool MSI "Managed Identity Operator" role on the user-assigned managed identity created as part of resource prerequisites. +> When a user has permission to assign the Azure RBAC roles, it's assigned automatically. +> +> AKS agentpool managed identity is created during cluster pool creation. You can identify the AKS agentpool managed identity by (your clusterpool name)-agentpool. +> Follow these steps to [assign the role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page). + +For a quickstart, refer to the following steps. + +1. When the cluster pool creation completes, click Go to resource from the Your deployment is complete page or the Notifications area. If the Go to resource option isn't available, type HDInsight on AKS cluster pool in the search bar on the Azure portal, and then select the cluster pool you created. + +1. Click + New cluster from and then provide the following information: + + :::image type="content" source="./media/quickstart-create-cluster/create-new-cluster.png" alt-text="Screenshot showing create new cluster option."::: + + :::image type="content" source="./media/quickstart-create-cluster/create-cluster-basic-page.png" alt-text="Diagram showing how to create a new cluster." border="true" lightbox="./media/quickstart-create-cluster/create-cluster-basic-page.png"::: + + \| Property\| Description\| + \|\|\| + \|Subscription \| By default, it's populated with the subscription used for the cluster pool.\| + \|Resource group\| By default, it's populated with the resource group used for the cluster pool.\| + \|Cluster pool\|Represents the cluster pool in which the cluster has to be created. To create a cluster in a different pool, find that cluster pool in the portal and click + New cluster.\| + \|Region\| By default, it's populated with the region used for the cluster pool.\| + \|Cluster pool version\|By default, it's populated with the version used for the cluster pool.\| + \|HDInsight on AKS version\| From the drop-down list, select the HDInsight on AKS version. For more information, see [versioning](./versions.md).\| + \|Cluster type \| From the drop-down list, select the type of Cluster you want to create: Trino, Flink, or Spark.\| + \|Cluster package\| Select the cluster package with component version available for the selected cluster type. \| + \|Cluster name\|Enter the name of the new cluster.\| + \|User-assigned managed identity \| Select the managed identity to use with the cluster.\| + \|Storage account (ADLS Gen2) \| Select a storage account and a container that is the default location for cluster logs and other output. It's mandatory for Apache Flink and Spark cluster type.\| + \|Virtual network (VNet) \| The virtual network for the cluster. It's derived from the cluster pool.\| + \|Subnet\|The virtual network subnet for the cluster. It's derived from the cluster pool.\| + + Click Next: Configuration to continue. + +1. On the Configuration page, provide the following information: + + :::image type="content" source="./media/quickstart-create-cluster/configuration-and-pricing-tab.png" alt-text="Diagram showing configuration tab."::: + + + \|Property\|Description\| + \|\|\| + \|Head node size\| This value is same as the worker node size.\| + \|Number of head nodes\|This value is set by default based on the cluster type.\| + \|Worker node size\| From the drop-down list, select the recommended SKU or you can choose the SKU available in your subscription by clicking Select VM size.\| + \|Number of worker nodes\|Select the number of worker nodes required for your cluster.\| + \|Autoscale\|(Optional) Select this option to enable the autoscale capability\| + \|Secure shell (SSH) configuration\|(Optional) Select this option to enable SSH node. By enabling SSH, more VM nodes are created.\| + + > [!NOTE] + > You will see extra section to provide service configurations for Apache Flink clusters. + + Click Next: Integrations to continue. + +1. On the Integrations page, provide the following information: + + :::image type="content" source="./media/quickstart-create-cluster/cluster-integration-tab.png" alt-text="Diagram showing integration tab."::: + + \|Property\|Description\| + \|\|\| + \|Log Analytics\|(Optional) Select this option to enable Log analytics to view insights and logs directly in your cluster by sending metrics and logs to a Log Analytics Workspace.\| + \|Azure Prometheus\|(Optional) Select this option to enable Azure Managed Prometheus to view Insights and Logs directly in your cluster by sending metrics and logs to an Azure Monitor workspace.\| + + > [!NOTE] + > To enable Log Analytics and Azure Prometheus, it should be first enabled at the cluster pool level. + + Click Next: Tags to continue. + +1. On the Tags page, enter any tags(optional) youΓÇÖd like to assign to the cluster. + + :::image type="content" source="./media/quickstart-create-cluster/create-cluster-tags-page.png" alt-text="Screenshot showing tags page."::: + + \| Property \| Description\| + \|\|\| + \|Name \| Enter a name (key) that help you identify resources based on settings that are relevant to your organization. "Environment" to track the deployment environment for your resources.\| + \| Value \| Enter the value that helps to relate to the resources. "Production" to identify the resources deployed to production.\| + \| Resource \| Select the applicable resource type.\| + + Select Next: Review + create to continue. + +1. On the Review + create page, look for the Validation succeeded message at the top of the page and then click Create. + + :::image type="content" source="./media/quickstart-create-cluster/create-cluster-review-create-page.png" alt-text="Diagram showing cluster review and create tab." lightbox="./media/quickstart-create-cluster/create-cluster-review-create-page.png"::: + + The Deployment is in process page is displayed while the cluster is being created, and the "Your deployment is complete" page is displayed once the cluster is fully deployed and ready for use. + + > [!TIP] + > For troubleshooting any deployment errors, you can refer to this [page](./create-cluster-error-dictionary.md).
hdinsight-aks	Hdinsight Aks Release Notes Archive	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/release-notes/hdinsight-aks-release-notes-archive.md	+ + Title: Archived release notes for Azure HDInsight on AKS +description: Archived release notes for Azure HDInsight on AKS. Get development tips and details for Trino, Flink, and Spark. ++ Last updated : 08/29/2023++ +# Azure HDInsight on AKS archived release notes + +Now there are no archived release notes available for HDInsight on AKS. ++++
hdinsight-aks	Hdinsight Aks Release Notes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/release-notes/hdinsight-aks-release-notes.md	+ + Title: Release notes for Azure HDInsight on AKS +description: Latest release notes for Azure HDInsight on AKS. Get development tips and details for Trino, Flink, Spark, and more. ++ Last updated : 08/29/2023++ +# Azure HDInsight on AKS release notes + +This article provides information about the most recent- Azure HDInsight on AKS release updates. + +## Summary + +Azure HDInsight on AKS is a new version of HDInsight, which runs on Kubernetes and brings in the best of the open source on Kubernetes. It's gaining popularity among enterprise customers for open-source analytics on Azure Kubernetes Services. + +## Release date: October 10, 2023 + +This release applies to the following + +- Cluster Pool Version:┬á1.0 +- Cluster Version:┬á1.0.6 + +> [!NOTE] +> To understand about HDInsight on AKS versioning and support, refer to theΓÇ»[versioning page](../versions.md). + +You can refer to [What's new](../whats-new.md) page for all the details of the features currently in public preview for this release. + +## Release Information + +### Operating System version + +- Mariner OS 2.0 + +Workload versions + +\|Workload\|Version\| +\| -- \| -- \| +\|Trino┬á\| 410┬á\| +\|Flink┬á\| 1.16┬á\| +\|Apache Spark┬á\| 3.3.1┬á\| + +Supported Java and Scala versions + +\|Java\|Scala\| +\| -- \| -- \| +\|8, JDK 1.8.0_345┬á\|2.12.10┬á\| + +The preview is available in the following [regions](../overview.md#region-availability-public-preview). + +If you have any more questions, contact [Azure Support](https://ms.portal.azure.com/#view/Microsoft_Azure_Support/HelpAndSupportBlade/~/overview) or refer to the [Support options](../hdinsight-aks-support-help.md) page. + +### Next steps + +- [Azure HDInsight on AKS : Frequently asked questions](../faq.md) +- [Create a cluster pool and cluster](../quickstart-create-cluster.md)
hdinsight-aks	Required Outbound Traffic	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/required-outbound-traffic.md	+ + Title: Outbound traffic on HDInsight on AKS +description: Learn required outbound traffic on HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Required outbound traffic for HDInsight on AKS ++ +This article outlines the networking information to help manage the network policies at enterprise and make necessary changes to the network security groups (NSGs) for smooth functioning of HDInsight on AKS. + +If you use firewall to control outbound traffic to your HDInsight on AKS cluster, you must ensure that your cluster can communicate with critical Azure services. +Some of the security rules for these services are region-specific, and some of them apply to all Azure regions. + +You need to configure the following network and application security rules in your firewall to allow outbound traffic. + +## Common traffic + +\|Type\| Destination Endpoint \| Protocol \| Port \| Azure Firewall Rule Type \| Use \| +\|-\|--\|-\|\|--\| -\| +\| ServiceTag \| AzureCloud.`<Region>` \| UDP \| 1194 \| Network security rule\| Tunneled secure communication between the nodes and the control plane.\| +\| ServiceTag \| AzureCloud.`<Region>` \| TCP \| 9000 \| Network security rule\|Tunneled secure communication between the nodes and the control plane.\| +\| FQDN Tag\| AzureKubernetesService \| HTTPS \| 443 \|Application security rule\| Required by AKS Service.\| +\| Service Tag \| AzureMonitor \| TCP \| 443 \|Application security rule\| Required for integration with Azure Monitor.\| +\| FQDN\| hiloprodrpacr00.azurecr.io\|HTTPS\|443\|Application security rule\| Downloads metadata info of the docker image for setup of HDInsight on AKS and monitoring.\| +\| FQDN\| .blob.core.windows.net\|HTTPS\|443\|Application security rule\| Monitoring and setup of HDInsight on AKS.\| +\| FQDN\|graph.microsoft.com\|HTTPS\|443\|Application security rule\| Authentication.\| +\| FQDN\|.servicebus.windows.net\|HTTPS\|443\|Application security rule\| Monitoring.\| +\| FQDN\|.table.core.windows.net\|HTTPS\|443\|Application security rule\| Monitoring. +\| FQDN\|gcs.prod.monitoring.core.windows.net\|HTTPS\|443\|Application security rule\| Monitoring.\| +\| FQDN\|API Server FQDN (available once AKS cluster is created)\|TCP\|443\|Network security rule\| Required as the running pods/deployments use it to access the API Server. You can get this information from the AKS cluster running behind the cluster pool. For more information, see [how to get API Server FQDN](secure-traffic-by-firewall-azure-portal.md#get-aks-cluster-details-created-behind-the-cluster-pool) using Azure portal.\| ++ +## Cluster specific traffic + +The below section outlines any specific network traffic, which a cluster shape requires, to help enterprises plan and update the network rules accordingly. + +### Trino + +\| Type \| Destination Endpoint \| Protocol \| Port \| Azure Firewall Rule Type \|Use \| +\|\|--\|-\|\|--\|-\| +\| FQDN\|.dfs.core.windows.net\|HTTPS\|443\|Application security rule\|Required if Hive is enabled. It's user's own Storage account, such as contosottss.dfs.core.windows.net\| +\| FQDN\|.database.windows.net\|mysql\|1433\|Application security rule\|Required if Hive is enabled. It's user's own SQL server, such as contososqlserver.database.windows.net\| +\|Service Tag \| Sql.`<Region>`\|TCP\|11000-11999\|Network security rule\|Required if Hive is enabled. It's used in connecting to SQL server. It's recommended to allow outbound communication from the client to all Azure SQL IP addresses in the region on ports in the range of 11000 to 11999. Use the Service Tags for SQL to make this process easier to manage. When using the Redirect connection policy, refer to the [Azure IP Ranges and Service Tags ΓÇô Public Cloud](https://www.microsoft.com/download/details.aspx?id=56519) for a list of your region's IP addresses to allow.\| + +### Spark + +\| Type \| Destination Endpoint\| Protocol \| Port \| Azure Firewall Rule Type \|Use \| +\|\|\|-\|\|--\|\| +\| FQDN\|.dfs.core.windows.net\|HTTPS\|443\|Application security rule\|Spark Azure Data Lake Storage Gen2. It's user's Storage account: such as contosottss.dfs.core.windows.net\| +\|Service Tag \| Storage.`<Region>`\|TCP\|445\|Network security rule\|Use SMB protocol to connect to Azure File\| +\| FQDN\|.database.windows.net\|mysql\|1433\|Application security rule\|Required if Hive is enabled. It's user's own SQL server, such as contososqlserver.database.windows.net\| +\|Service Tag \| Sql.`<Region>`\|TCP\|11000-11999\|Network security rule\|Required if Hive is enabled. It's used to connect to SQL server. It's recommended to allow outbound communication from the client to all Azure SQL IP addresses in the region on ports in the range of 11000 to 11999. Use the Service Tags for SQL to make this process easier to manage. When using the Redirect connection policy, refer to the [Azure IP Ranges and Service Tags ΓÇô Public Cloud](https://www.microsoft.com/download/details.aspx?id=56519) for a list of your region's IP addresses to allow. \| + +### Apache Flink + +\|Type\|Destination Endpoint\|Protocol\|Port\|Azure Firewall Rule Type \|Use\| +\|-\|-\|-\|-\|-\|--\| +\|FQDN\|`.dfs.core.windows.net`\|HTTPS\|443\|Application security rule\|Flink Azure Data Lake Storage Gens. It's user's Storage account: such as contosottss.dfs.core.windows.net\| + +## Next steps +* [How to use firewall to control outbound traffic and apply rules](./secure-traffic-by-firewall.md). +* [How to use NSG to restrict traffic](./secure-traffic-by-nsg.md).
hdinsight-aks	Secure Traffic By Firewall Azure Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/secure-traffic-by-firewall-azure-portal.md	+ + Title: Use firewall to restrict outbound traffic on HDInsight on AKS, using Azure portal +description: Learn how to secure traffic using firewall on HDInsight on AKS using Azure portal ++ Last updated : 08/3/2023++ +# Use firewall to restrict outbound traffic using Azure portal ++ +When an enterprise wants to use their own virtual network for the cluster deployments, securing the traffic of the virtual network becomes important. +This article provides the steps to secure outbound traffic from your HDInsight on AKS cluster via Azure Firewall using Azure portal. + +The following diagram illustrates the example used in this article to simulate an enterprise scenario: +++ +## Create a virtual network and subnets + + 1. Create a virtual network and two subnets. + + In this step, set up a virtual network and two subnets for configuring the egress specifically. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-virtual-network-step-2.png" alt-text="Diagram showing creating a virtual network in the resource group using Azure portal step number 2." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-virtual-network-step-2.png"::: + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-virtual-network-step-3.png" alt-text="Diagram showing creating a virtual network and setting IP address using Azure portal step 3." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-virtual-network-step-3.png"::: + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-virtual-network-step-4.png" alt-text="Diagram showing creating a virtual network and setting IP address using Azure portal in step number four." border= "true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-virtual-network-step-4.png"::: + + > [!IMPORTANT] + > * If you add NSG in subnet , you need to add certain outbound and inbound rules manually. Follow [use NSG to restrict the traffic](./secure-traffic-by-nsg.md). + > * Don't associate subnet `hdiaks-egress-subnet` with a route table because HDInsight on AKS creates cluster pool with default outbound type and can't create the cluster pool in a subnet already associated with a route table. + +## Create HDInsight on AKS cluster pool using Azure portal + + 1. Create a cluster pool. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-pool-step-5.png" alt-text="Diagram showing creating a HDInsight on AKS cluster pool using Azure portal in step number five." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-cluster-pool-step-5.png"::: + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-pool-networking-step-6.png" alt-text="Diagram showing creating a HDInsight on AKS cluster pool networking using Azure portal step 6." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-cluster-pool-networking-step-6.png"::: + + 2. When HDInsight on AKS cluster pool is created, you can find a route table in subnet `hdiaks-egress-subnet`. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-pool-networking-step-7.png" alt-text="Diagram showing creating a HDInsight on AKS cluster pool networking using Azure portal step 7." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-cluster-pool-networking-step-7.png"::: + +### Get AKS cluster details created behind the cluster pool + +You can search your cluster pool name in portal, and go to AKS cluster. For example, ++ +Get AKS API Server details. ++ +## Create firewall + + 1. Create firewall using Azure portal. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-firewall-step-10.png" alt-text="Diagram showing creating a firewall using Azure portal step 10." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-firewall-step-10.png"::: + + 3. Enable DNS proxy server of firewall. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-firewall-dns-proxy-step-11.png" alt-text="Diagram showing creating a firewall and DNS proxy using Azure portal step 11." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-firewall-dns-proxy-step-11.png"::: + + 5. Once the firewall is created, find the firewall internal IP and public IP. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-firewall-dns-proxy-step-12.png" alt-text="Diagram showing creating a firewall and DNS proxy internal and public IP using Azure portal step 12." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-firewall-dns-proxy-step-12.png"::: + +### Add network and application rules to the firewall + + 1. Create the network rule collection with following rules. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-firewall-rules-step-13.png" alt-text="Diagram showing adding firewall rules using Azure portal step 13." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-firewall-rules-step-13.png"::: + + 2. Create the application rule collection with following rules. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-firewall-rules-step-14.png" alt-text="Diagram showing adding firewall rules using Azure portal step 14." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-firewall-rules-step-14.png"::: + +### Create route in the route table to redirect the traffic to firewall + +Add new routes to route table to redirect the traffic to the firewall. +++ +## Create cluster + +In the previous steps, we have routed the traffic to firewall. + +The following steps provide details about the specific network and application rules needed by each cluster type. You can refer to the cluster creation pages for creating [Apache Flink](./flink/flink-create-cluster-portal.md), [Trino](./trino/trino-create-cluster.md), and [Apache Spark](./spark/hdinsight-on-aks-spark-overview.md) clusters based on your need. + +> [!IMPORTANT] +> Before creating the cluster, make sure to add the following cluster specific rules to allow the traffic. + +### Trino + + 1. Add the following rules to application rule collection `aksfwar`. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-trino-step-16.png" alt-text="Diagram showing adding application rules for Trino Cluster using Azure portal step 16." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-cluster-trino-step-16.png"::: + + 2. Add the following rule to network rule collection `aksfwnr`. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-trino-1-step-16.png" alt-text="Diagram showing how to add application rules to network rule collection for Trino Cluster using Azure portal step 16." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-cluster-trino-1-step-16.png"::: + + > [!NOTE] + > Change the `Sql.<Region>` to your region as per your requirement. For example: `Sql.WestEurope` + +### Apache Flink + + 1. Add the following rule to application rule collection `aksfwar`. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-flink-step-17.png" alt-text="Diagram showing adding application rules for Apache Flink Cluster using Azure portal step 17." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-cluster-flink-step-17.png"::: + +### Apache Spark + + 1. Add the following rules to application rule collection `aksfwar`. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-spark-step-18.png" alt-text="Diagram showing adding application rules for Apache Flink Cluster using Azure portal step 18." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal/create-cluster-spark-1-step-18.png"::: + + 2. Add the following rules to network rule collection `aksfwnr`. + + :::image type="content" source="./media/secure-traffic-by-firewall-azure-portal/create-cluster-spark-1-step-18.png" alt-text="Diagram showing how to add application rules for Apache Flink Cluster using Azure portal step 18." border="true" lightbox="./media/secure-traffic-by-firewall-azure-portal//create-cluster-spark-1-step-18.png"::: + + > [!NOTE] + > 1. Change the `Sql.<Region>` to your region as per your requirement. For example: `Sql.WestEurope` + > 2. Change the `Storage.<Region>` to your region as per your requirement. For example: `Storage.WestEurope` ++ +## Solving symmetric routing issue + +The following steps allow us to request cluster by cluster load balancer ingress service and ensure the network response traffic doesn't flow to firewall. + +Add a route to the route table to redirect the response traffic to your client IP to Internet and then, you can reach the cluster directly. ++ + If you can't reach the cluster and have configured NSG, follow [use NSG to restrict the traffic](./secure-traffic-by-nsg.md) to allow the traffic. + +> [!TIP] +> If you want to permit more traffic, you can configure it over the firewall. + +## How to Debug +If you find the cluster works unexpectedly, you can check the firewall logs to find which traffic is blocked.
hdinsight-aks	Secure Traffic By Firewall	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/secure-traffic-by-firewall.md	+ + Title: Use firewall to restrict outbound traffic on HDInsight on AKS using Azure CLI +description: Learn how to secure traffic using firewall on HDInsight on AKS using Azure CLI ++ Last updated : 08/3/2023++ +# Use firewall to restrict outbound traffic using Azure CLI ++ +When an enterprise wants to use their own virtual network for the cluster deployments, securing the traffic of the virtual network becomes important. +This article provides the steps to secure outbound traffic from your HDInsight on AKS cluster via Azure Firewall using [Azure CLI](/azure/cloud-shell/quickstart?tabs=azurecli). + +The following diagram illustrates the example used in this article to simulate an enterprise scenario: ++ +The example demonstrated in this article is using Azure Could Shell. + +## Define the variables + +Copy and execute in the Azure Cloud Shell to set the values of these variables. + +```azurecli +PREFIX="hdiaks-egress" +RG="${PREFIX}-rg" +LOC="eastus" +HDIAKS_CLUSTER_POOL=${PREFIX} +VNET_NAME="${PREFIX}-vnet" +HDIAKS_SUBNET_NAME="${PREFIX}-subnet" +# DO NOT CHANGE FWSUBNET_NAME - This is currently a requirement for Azure Firewall. +FWSUBNET_NAME="AzureFirewallSubnet" +FWNAME="${PREFIX}-fw" +FWPUBLICIP_NAME="${PREFIX}-fwpublicip" +FWIPCONFIG_NAME="${PREFIX}-fwconfig" +FWROUTE_NAME="${PREFIX}-fwrn" +FWROUTE_NAME_INTERNET="${PREFIX}-fwinternet" +``` + +## Create a virtual network and subnets + +1. Create a resource group using the az group create command. + + ```azurecli + az group create --name $RG --location $LOC + ``` + +1. Create a virtual network and two subnets. + + 1. Virtual network with subnet for HDInsight on AKS cluster pool + + ```azurecli + az network vnet create \ + --resource-group $RG \ + --name $VNET_NAME \ + --location $LOC \ + --address-prefixes 10.0.0.0/8 \ + --subnet-name $HDIAKS_SUBNET_NAME \ + --subnet-prefix 10.1.0.0/16 + ``` + + 1. Subnet for Azure Firewall. + ```azurecli + az network vnet subnet create \ + --resource-group $RG \ + --vnet-name $VNET_NAME \ + --name $FWSUBNET_NAME \ + --address-prefix 10.2.0.0/16 + ``` + > [!Important] + > 1. If you add NSG in subnet `HDIAKS_SUBNET_NAME`, you need to add certain outbound and inbound rules manually. Follow [use NSG to restrict the traffic](./secure-traffic-by-nsg.md). + > 1. Don't associate subnet `HDIAKS_SUBNET_NAME` with a route table because HDInsight on AKS creates cluster pool with default outbound type and can't create the cluster pool in a subnet already associated with a route table. + +## Create HDInsight on AKS cluster pool using Azure portal + + 1. Create a cluster pool. + + :::image type="content" source="./media/secure-traffic-by-firewall/basic-tab.png" alt-text="Diagram showing the cluster pool basic tab." border="true" lightbox="./media/secure-traffic-by-firewall/basic-tab.png"::: + + :::image type="content" source="./media/secure-traffic-by-firewall/security-tab.png" alt-text="Diagram showing the security tab." border="true" lightbox="./media/secure-traffic-by-firewall/security-tab.png"::: + + 1. When HDInsight on AKS cluster pool is created, you can find a route table in subnet `HDIAKS_SUBNET_NAME`. + + :::image type="content" source="./media/secure-traffic-by-firewall/route-table.png" alt-text="Diagram showing the route table." border="true" lightbox="./media/secure-traffic-by-firewall/route-table.png"::: + +### Get AKS cluster details created behind the cluster pool + + Follow the steps to get the AKS cluster information, which is useful in the subsequent steps. + + ```azurecli + AKS_MANAGED_RG=$(az network vnet subnet show --name $HDIAKS_SUBNET_NAME --vnet-name $VNET_NAME --resource-group $RG --query routeTable.resourceGroup -o tsv) + + AKS_ID=$(az group show --name $AKS_MANAGED_RG --query managedBy -o tsv) + + HDIAKS_MANAGED_RG=$(az resource show --ids $AKS_ID --query "resourceGroup" -o tsv) + + API_SERVER=$(az aks show --name $HDIAKS_CLUSTER_POOL --resource-group $HDIAKS_MANAGED_RG --query fqdn -o tsv) + ``` + +## Create firewall + +1. Create a Standard SKU public IP resource. This resource is used as the Azure Firewall frontend address. + + ```azurecli + az network public-ip create -g $RG -n $FWPUBLICIP_NAME -l $LOC --sku "Standard" + ``` + +1. Register the Azure Firewall preview CLI extension to create an Azure Firewall. + + ```azurecli + az extension add --name azure-firewall + ``` +1. Create an Azure Firewall and enable DNS proxy. + + ```azurecli + az network firewall create -g $RG -n $FWNAME -l $LOC --enable-dns-proxy true + ``` + +1. Create an Azure Firewall IP configuration. + + ```azurecli + az network firewall ip-config create -g $RG -f $FWNAME -n $FWIPCONFIG_NAME --public-ip-address $FWPUBLICIP_NAME --vnet-name $VNET_NAME + ``` + +1. Once the IP configuration command succeeds, save the firewall frontend IP address for configuration later. + + ```azurecli + FWPUBLIC_IP=$(az network public-ip show -g $RG -n $FWPUBLICIP_NAME --query "ipAddress" -o tsv) + FWPRIVATE_IP=$(az network firewall show -g $RG -n $FWNAME --query "ipConfigurations[0].privateIPAddress" -o tsv) + ``` +### Add network and application rules to the firewall + +1. Create the network rules. + + ```azurecli + az network firewall network-rule create -g $RG -f $FWNAME --collection-name 'aksfwnr' -n 'apiudp' --protocols 'UDP' --source-addresses '' --destination-addresses "AzureCloud.$LOC" --destination-ports 1194 --action allow --priority 100 + + az network firewall network-rule create -g $RG -f $FWNAME --collection-name 'aksfwnr' -n 'apitcp' --protocols 'TCP' --source-addresses '' --destination-addresses "AzureCloud.$LOC" --destination-ports 9000 + + az network firewall network-rule create -g $RG -f $FWNAME --collection-name 'aksfwnr' -n 'apiserver' --protocols 'TCP' --source-addresses '' --destination-fqdns "$API_SERVER" --destination-ports 443 + + #Add below step, in case you are integrating log analytics workspace + + az network firewall network-rule create -g $RG -f $FWNAME --collection-name 'aksfwnr' -n 'azuremonitor' --protocols 'TCP' --source-addresses '' --destination-addresses "AzureMonitor" --destination-ports 443 + ``` + +1. Create the application rules. + + ```azurecli + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'aks-fqdn' --source-addresses '' --protocols 'http=80' 'https=443' --fqdn-tags "AzureKubernetesService" --action allow --priority 100 + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'acr' --source-addresses '' --protocols 'https=443' --target-fqdns "hiloprodrpacr00.azurecr.io" + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'blob' --source-addresses '' --protocols 'https=443' --target-fqdns ".blob.core.windows.net" + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'servicebus' --source-addresses '' --protocols 'https=443' --target-fqdns ".servicebus.windows.net" + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'gsm' --source-addresses '' --protocols 'https=443' --target-fqdns ".table.core.windows.net" + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'gcsmonitoring' --source-addresses '' --protocols 'https=443' --target-fqdns "gcs.prod.monitoring.core.windows.net" + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'graph' --source-addresses '' --protocols 'https=443' --target-fqdns "graph.microsoft.com" + ``` + +### Create route in the route table to redirect the traffic to firewall + +1. Get the route table associated with HDInsight on AKS cluster pool. + + ```azurecli + ROUTE_TABLE_ID=$(az network vnet subnet show --name $HDIAKS_SUBNET_NAME --vnet-name $VNET_NAME --resource-group $RG --query routeTable.id -o tsv) + + ROUTE_TABLE_NAME=$(az network route-table show --ids $ROUTE_TABLE_ID --query 'name' -o tsv) + ``` +1. Create the route. + ```azurecli + az network route-table route create -g $AKS_MANAGED_RG --name $FWROUTE_NAME --route-table-name $ROUTE_TABLE_NAME --address-prefix 0.0.0.0/0 --next-hop-type VirtualAppliance --next-hop-ip-address $FWPRIVATE_IP + + az network route-table route create -g $AKS_MANAGED_RG --name $FWROUTE_NAME_INTERNET --route-table-name $ROUTE_TABLE_NAME --address-prefix $FWPUBLIC_IP/32 --next-hop-type Internet + ``` +## Create cluster + +In the previous steps, we have routed the traffic to firewall. + +The following steps provide details about the specific network and application rules needed by each cluster type. You can refer to the cluster creation pages for creating [Apache Flink](./flink/flink-create-cluster-portal.md), [Trino](./trino/trino-create-cluster.md), and [Apache Spark](./spark/hdinsight-on-aks-spark-overview.md) clusters based on your need. ++ +> [!IMPORTANT] +> Before creating a cluster, make sure to run the following cluster specific rules to allow the traffic. + +### Trino + +1. Add the following network and application rules for a Trino cluster. + + ```azurecli + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'dfs' --source-addresses '' --protocols 'https=443' --target-fqdns ".dfs.core.windows.net" + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'mysql' --source-addresses '' --protocols 'mssql=1433' --target-fqdns ".database.windows.net" + ``` + + Change the `Sql.<Region>` in following syntax to your region as per your requirement. For example: `Sql.EastUS` + + ```azurecli + az network firewall network-rule create -g $RG -f $FWNAME --collection-name 'aksfwnr' -n 'mysql' --protocols 'TCP' --source-addresses '' --destination-addresses Sql.<Region> --destination-ports "11000-11999" + ``` + +### Apache Flink + +1. Add the following application rule for an Apache Flink cluster. + + ```azurecli + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'dfs' --source-addresses '' --protocols 'https=443' --target-fqdns ".dfs.core.windows.net" + ``` + +### Apache Spark + +1. Add the following network and application rules for a Spark cluster. + + Change the `Storage.<Region>` in the following syntax to your region as per your requirement. For example: `Storage.EastUS`* + + ```azurecli + az network firewall network-rule create -g $RG -f $FWNAME --collection-name 'aksfwnr' -n 'smb' --protocols 'TCP' --source-addresses '' --destination-addresses "Storage.<Region>" --destination-ports 445 + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'dfs' --source-addresses '' --protocols 'https=443' --target-fqdns ".dfs.core.windows.net" + ``` + + Change the `Sql.<Region> `in the following syntax to your region as per your requirement. For example: `Sql.EastUS`* + + ```azurecli + az network firewall network-rule create -g $RG -f $FWNAME --collection-name 'aksfwnr' -n 'mysql' --protocols 'TCP' --source-addresses '' --destination-addresses "Sql.<Region>" --destination-ports '11000-11999' + + az network firewall application-rule create -g $RG -f $FWNAME --collection-name 'aksfwar' -n 'mysql' --source-addresses '' --protocols 'mssql=1433' --target-fqdns "*.database.windows.net" + ``` + +### Solve symmetric routing issue + + The following steps allow you to request cluster by cluster load balancer ingress service and ensure the network response traffic doesn't flow to firewall. + Add a route to the route table to redirect the response traffic to your client IP to Internet and then, you can reach the cluster directly. + + + ```azurecli + az network route-table route create -g $AKS_MANAGED_RG --name clientip --route-table-name $ROUTE_TABLE_NAME --address-prefix {Client_IPs} --next-hop-type Internet + ``` + + If you can't reach the cluster and have configured NSG, follow [use NSG to restrict the traffic](./secure-traffic-by-nsg.md) to allow the traffic. + +> [!TIP] +> If you want to allow more traffic, you can configure it over the firewall. + +## How to debug + +If you find the cluster works unexpectedly, you can check the firewall logs to find which traffic is blocked. +
hdinsight-aks	Secure Traffic By Nsg	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/secure-traffic-by-nsg.md	+ + Title: Use NSG to restrict traffic on HDInsight on AKS +description: Learn how to secure traffic by NSGs on HDInsight on AKS ++ Last updated : 08/3/2023++ +# Use NSG to restrict traffic to HDInsight on AKS ++ +HDInsight on AKS relies on AKS outbound dependencies and they're entirely defined with FQDNs, which don't have static addresses behind them. The lack of static IP addresses means one can't use Network Security Groups (NSGs) to lock down the outbound traffic from the cluster using IPs. + +If you still prefer to use NSG to secure your traffic, then you need to configure the following rules in the NSG to do a coarse-grained control. + +Learn [how to create a security rule in NSG](/azure/virtual-network/manage-network-security-group?tabs=network-security-group-portal#create-a-security-rule). + +## Outbound security rules (Egress traffic) +### Common traffic + +\|Destination\| Destination Endpoint\| Protocol \| Port \| +\|-\|--\|-\|\| +\| Service Tag \| AzureCloud.`<Region>` \| UDP \| 1194 \| +\| Service Tag \| AzureCloud.`<Region>` \| TCP \| 9000 \| +\| Any \| * \| TCP \| 443, 80\| + +### Cluster specific traffic + +This section outlines cluster specific traffic that an enterprise can apply. + +#### Trino + +\|Destination\| Destination Endpoint\| Protocol \| Port\| +\|--\|-\|\|\| +\| Any \| * \| TCP \| 1433\| +\| Service Tag \| Sql.`<Region>` \| TCP \| 11000-11999 \| + +#### Spark + +\|Destination\| Destination Endpoint \| Protocol \|Port\| +\|-\|--\|-\|\| +\| Any \| * \| TCP \| 1433\| +\| Service Tag \| Sql.`<Region>` \| TCP \| 11000-11999 \| +\| Service Tag \| Storage.`<Region>` \| TCP \| 445 \| + +#### Apache Flink +None + +## Inbound security rules (Ingress traffic) + +When clusters are created, then certain ingress public IPs also get created. To allow requests to be sent to the cluster, you need to allowlist the traffic to these public IPs with port 80 and 443. + +The following Azure CLI command can help you get the ingress public IP: + +``` +aksManagedResourceGroup=`az rest --uri https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.HDInsight/clusterpools/{clusterPoolName}\?api-version\=2023-06-01-preview --query properties.managedResourceGroupName -o tsv --query properties.aksManagedResourceGroupName -o tsv` + +az network public-ip list --resource-group $aksManagedResourceGroup --query "[?starts_with(name, 'kubernetes')].{Name:name, IngressPublicIP:ipAddress}" --output table +``` + +\| Source┬á \| Source IP addresses/CIDR ranges┬á \| Protocol┬á \| Port┬á \| +\| - \|- \|- \| - \| +\| IP Addresses┬á \| `<Public IP retrieved from above command>`┬á \| TCP┬á \| 80┬á \| +\| IP Addresses┬á \| `<Public IP retrieved from above command>`┬á \| TCP┬á \| 443┬á \| ++
hdinsight-aks	Service Configuration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/service-configuration.md	+ + Title: Manage cluster configuration +description: How to update cluster configuration for HDInsight on AKS. + Last updated : 08/29/2023++ +# Manage cluster configuration ++ +HDInsight on AKS allows you to tweak the configuration properties to improve performance of your cluster with certain settings. For example, usage or memory settings. You can do the following actions: + +* Update the existing configurations or add new configurations. +* Export the configurations using REST API. + +## Customize configurations + +You can customize configurations using following options: + +* [Using Azure portal](#using-azure-portal) +* [Using ARM template](#using-arm-template) + +### Using Azure portal + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/service-configuration/get-started-portal-search-step-1.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster."::: + +1. Select your cluster name from the list page. + + :::image type="content" source="./media/service-configuration/get-started-portal-list-view-step-2.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list."::: + +1. Go to "Configuration management" blade in the left menu. + + :::image type="content" source="./media/service-configuration/configuration-management-tab.png" alt-text="Screenshot showing Configuration Management tab."::: + +1. Depending on the cluster type, configurations files are listed. For more information, see [Trino](./trino/trino-service-configuration.md), [Flink](./flink/flink-configuration-management.md), and [Spark](./spark/configuration-management.md) configurations. + +1. Add new or update the existing key-value pair for the configurations you want to modify. + +1. Select OK and then click Save. + +> [!NOTE] +> Some configuration change may need service restart to reflect the changes. + +### Using ARM template + +#### Prerequisites + +* [ARM template](./create-cluster-using-arm-template.md) for your cluster. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +In the ARM template, you can edit serviceConfigsProfiles and specify the OSS configuration file name with the value you would like to overwrite. + +If the OSS configuration file is in JSON/XML/YAML format, you can provide the OSS configuration file name via `fileName`. Provide the key value pairs that you want to overwrite in ΓÇ£values.ΓÇ¥ + +Here are some samples for each workload: + +Flink configuration example: + +```json + "serviceConfigsProfiles": [ + { + "serviceName": "flink-operator", + "configs": [ + { + "component": "flink-configs", + "files": [ + { + "fileName": "flink-conf.yaml", + "values": { + "taskmanager.memory.process.size": "4096mb", + "classloader.check-leaked-classloader": "false", + "jobmanager.memory.process.size": "4096mb", + "classloader.parent-first-patterns.additional": "org.apache.parquet" + } + } + ] + } + ] + } + ] +``` + +Spark configuration example: + +```json + "serviceConfigsProfiles": [ + { + "serviceName": "spark-service", + "configs": [ + { + "component": "livy-config", + "files": [ + { + "fileName": "livy-client.conf", + "values": { + "livy.client.http.connection.timeout": "11s" + } + } + ] + }, + { + "component": "spark-config", + "files": [ + { + "fileName": "spark-env.sh", + "content": "# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. \"-Dx=y\")\nexport HDP_VERSION=3.3.3.5.2-83515052\n" + } + ] + } + ] + } + ] +``` + +Trino configuration example: + +```json + "serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "coordinator", + "files": [ + { + "fileName": "config.properties", + "values": { + "query.cache.enabled": "true", + "query.cache.ttl": "1h", + "query.enable-multi-statement-set-session": "true", + "query.max-memory": "301GB" + } + }, + { + "fileName": "log.properties", + "values": { + "io.trino": "INFO" + } + } + ] + } + ] +``` + +For more information about Trino configuration options, see the sample ARM templates. + +* [arm-trino-config-sample.json](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) + +## Export the configurations using REST API + +You can also export cluster configurations to check the default and updated values. At this time, you can only export configurations via the REST API. + +Get cluster configurations: + +`GET Request: +/subscriptions/{{USER_SUB}}/resourceGroups/{{USER_RG}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSTERPOOL}}/clusters/{{CLUSTERNAME}}/serviceConfigs?api-version={{HDINSIGHTONAKS_API_VERSION}}` + +If you aren't familiar with how to send a REST API call, the following steps can help you. + +1. Click the following button at the top right in the Azure portal to launch Azure Cloud Shell. + + :::image type="content" source="./media/service-configuration/cloud-shell.png" alt-text="Screenshot screenshot showing Cloud Shell icon."::: + +1. Make sure Cloud Shell is set to PowerShell on the top left. Run the following command to get token and set HTTP request headers. + + ``` + $azContext = Get-AzContext + $azProfile = [Microsoft.Azure.Commands.Common.Authentication.Abstractions.AzureRmProfileProvider]::Instance.Profile + + $profileClient = New-Object -TypeName Microsoft.Azure.Commands.ResourceManager.Common.RMProfileClient -ArgumentList ($azProfile) + $token = $profileClient.AcquireAccessToken($azContext.Subscription.TenantId) + $authHeader = @{ + 'Content-Type'='application/json' + 'Authorization'='Bearer ' + $token.AccessToken + } + ``` + +1. Set the $restUri variable to the Get request URL. + + ``` + $restUri = + 'https://management.azure.com/subscriptions/{{USER_SUB}}/resourceGroups/{{USER_RG}}/providers/Microsoft.HDInsight/clusterpools/{{CLUSTERPOOL}}/clusters/{{CLUSTERNAME}}/serviceConfigs?api-version={{HDINSIGHTONAKS_API_VERSION}}' + ``` + For example: + `$restUri = 'https://management.azure.com/subscriptions/xxx-yyyy-zzz-00000/resourceGroups/contosoRG/providers/Microsoft.HDInsight/clusterpools/contosopool/clusters/contosocluster/serviceConfigs?api-version=2021-09-15-preview` + + > [!NOTE] + > You can get the resource id and up-to-date api-version from the "JSON View" of your cluster in the Azure portal. + > + > :::image type="content" source="./media/service-configuration/view-cost-json-view.png" alt-text="Screenshot view cost JSON View buttons."::: + +1. Send the GET request by executing following command. + + `Invoke-RestMethod -Uri $restUri -Method Get -Headers $authHeader \| ConvertTo-Json -Depth 10` +
hdinsight-aks	Service Health	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/service-health.md	+ + Title: Manage service health. +description: Learn how to check the health of the services running in a cluster. ++ Last updated : 08/29/2023++ +# Manage service health ++ +This article describes how to check the health of the services running in HDInsight on AKS cluster. It includes the collection of the services and the status of each service running in the cluster. +You can drill down on each service to check instance level details. + +There are two categories of + +* *Main + + Internal + +> [!NOTE] +> To view details in the Services tab, a user should be assigned [Azure Kubernetes Service Cluster User Role](/azure/role-based-access-control/built-in-roles#azure-kubernetes-service-cluster-user-role) and [Azure Kubernetes Service RBAC Reader](/azure/role-based-access-control/built-in-roles#azure-kubernetes-service-rbac-reader) roles to an AKS cluster corresponding to the cluster pool. Go to Kubernetes services in the Azure portal and search for AKS cluster with your cluster pool name and then navigate to Access control (IAM) tab to assign the roles. +> +> If you don't have permission to assign the roles, please contact your admin. + +To check the service health, + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/service-health/get-started-portal-search-step-1.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster." border="true" lightbox="./media/service-health/get-started-portal-search-step-1.png"::: + +1. Select your cluster name from the list page. + + :::image type="content" source="./media/service-health/get-started-portal-list-view-step-2.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list." border="true" lightbox="./media/service-health/get-started-portal-list-view-step-2.png"::: + +1. Go to the Services blade for your cluster in the Azure portal. + + :::image type="content" source="./media/service-health/list-of-workload-servers.png" alt-text="Screenshot showing list of workload servers." border="true" lightbox="./media/service-health/list-of-workload-servers.png"::: + +1. Click on one of the listed services to drill down and view. + + :::image type="content" source="./media/service-health/zookeeper-service-health.png" alt-text="Screenshot showing Zookeeper's service health." lightbox="./media/service-health/zookeeper-service-health.png"::: + + + In case when some of the instances of the services aren't ready then, the Ready** column indicates the warning sign for the unhealthy instances running in the cluster out of total instances. + + You can drill down further to see the details. + + :::image type="content" source="./media/service-health/trino-service-health.png" alt-text="Screenshot showing Trino's service health." lightbox="./media/service-health/trino-service-health.png"::: +
hdinsight-aks	Azure Hdinsight Spark On Aks Delta Lake	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/azure-hdinsight-spark-on-aks-delta-lake.md	+ + Title: How to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster. +description: Learn how to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster. ++ Last updated : 08/29/2023++ +# Use Delta Lake scenario in Azure HDInsight on AKS Spark cluster (Preview) ++ +[Azure HDInsight on AKS](../overview.md) is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster. + +## Prerequisite + +1. Create an [Azure HDInsight on AKS Spark cluster](./create-spark-cluster.md) + + :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png" alt-text="Screenshot showing spark cluster creation." lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png"::: + +1. Run Delta Lake scenario in Jupyter Notebook. Create a Jupyter notebook and select "Spark" while creating a notebook, since the following example is in Scala. + + :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/delta-lake-scenario.png" alt-text="Screenshot showing how to run delta lake scenario." lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/delta-lake-scenario.png"::: + +## Scenario + +* Read NYC Taxi Parquet Data format - List of Parquet files URLs are provided from [NYC Taxi & Limousine Commission](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page). +* For each url (file) perform some transformation and store in Delta format. +* Compute the average distance, average cost per mile and average cost from Delta Table using incremental load. +* Store computed value from Step#3 in Delta format into the KPI output folder. +* Create Delta Table on Delta Format output folder (auto refresh). +* The KPI output folder has multiple versions of the average distance and the average cost per mile for a trip. + +### Provide require configurations for the delta lake + +Delta Lake Spark Compatibility matrix - [Delta Lake](https://docs.delta.io/latest/releases.html), change Delta Lake version based on Spark Version. + ``` +%%configure -f +{ "conf": {"spark.jars.packages": "io.delta:delta-core_2.12:1.0.1,net.andreinc:mockneat:0.4.8", +"spark.sql.extensions":"io.delta.sql.DeltaSparkSessionExtension", +"spark.sql.catalog.spark_catalog":"org.apache.spark.sql.delta.catalog.DeltaCatalog" +} + } +``` + +### List the data file + +> [!NOTE] +> These file URLs are from [NYC Taxi & Limousine Commission](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page). +``` +import java.io.File +import java.net.URL +import org.apache.commons.io.FileUtils +import org.apache.hadoop.fs._ + +// data file object is being used for future reference in order to read parquet files from HDFS +case class DataFile(name:String, downloadURL:String, hdfsPath:String) + +// get Hadoop file system +val fs:FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration) + +val fileUrls= List( +"https://d37ci6vzurychx.cloudfront.net/trip-data/fhvhv_tripdata_2022-01.parquet" + ) + +// Add a file to be downloaded with this Spark job on every node. + val listOfDataFile = fileUrls.map(url=>{ + val urlPath=url.split("/") + val fileName = urlPath(urlPath.size-1) + val urlSaveFilePath = s"/tmp/${fileName}" + val hdfsSaveFilePath = s"/tmp/${fileName}" + val file = new File(urlSaveFilePath) + FileUtils.copyURLToFile(new URL(url), file) + // copy local file to HDFS /tmp/${fileName} + // use FileSystem.copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst) + fs.copyFromLocalFile(true,true,new org.apache.hadoop.fs.Path(urlSaveFilePath),new org.apache.hadoop.fs.Path(hdfsSaveFilePath)) + DataFile(urlPath(urlPath.size-1),url, hdfsSaveFilePath) +}) +``` + + +### Create output directory + +The location where you want to create delta format output, change the `transformDeltaOutputPath` and `avgDeltaOutputKPIPath` variable if necessary, +* `avgDeltaOutputKPIPath` - to store average KPI in delta format +* `transformDeltaOutputPath` - store transformed output in delta format +``` +import org.apache.hadoop.fs._ + +// this is used to store source data being transformed and stored delta format +val transformDeltaOutputPath = "/nyctaxideltadata/transform" +// this is used to store Average KPI data in delta format +val avgDeltaOutputKPIPath = "/nyctaxideltadata/avgkpi" +// this is used for POWER BI reporting to show Month on Month change in KPI (not in delta format) +val avgMoMKPIChangePath = "/nyctaxideltadata/avgMoMKPIChangePath" + +// create directory/folder if not exist +def createDirectory(dataSourcePath: String) = { + val fs:FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration) + val path = new Path(dataSourcePath) + if(!fs.exists(path) && !fs.isDirectory(path)) { + fs.mkdirs(path) + } +} + +createDirectory(transformDeltaOutputPath) +createDirectory(avgDeltaOutputKPIPath) +createDirectory(avgMoMKPIChangePath) +``` + +### Create Delta Format Data From Parquet Format + +1. Input data is from `listOfDataFile`, where data downloaded from https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page +1. To demonstrate the Time travel and version, load the data individually +1. Perform transformation and compute following business KPI on incremental load: + 1. The average distance + 1. The average cost per mile + 1. The average cost +1. Save transformed and KPI data in delta format + + ``` + import org.apache.spark.sql.functions.udf + import org.apache.spark.sql.DataFrame + + // UDF to compute sum of value paid by customer + def totalCustPaid = udf((basePassengerFare:Double, tolls:Double,bcf:Double,salesTax:Double,congSurcharge:Double,airportFee:Double, tips:Double) => { + val total = basePassengerFare + tolls + bcf + salesTax + congSurcharge + airportFee + tips + total + }) + + // read parquet file from spark conf with given file input + // transform data to compute total amount + // compute kpi for the given file/batch data + def readTransformWriteDelta(fileName:String, oldData:Option[DataFrame], format:String="parquet"):DataFrame = { + val df = spark.read.format(format).load(fileName) + val dfNewLoad= df.withColumn("total_amount",totalCustPaid($"base_passenger_fare",$"tolls",$"bcf",$"sales_tax",$"congestion_surcharge",$"airport_fee",$"tips")) + // union with old data to compute KPI + val dfFullLoad= oldData match { + case Some(odf)=> + dfNewLoad.union(odf) + case _ => + dfNewLoad + } + dfFullLoad.createOrReplaceTempView("tempFullLoadCompute") + val dfKpiCompute = spark.sql("SELECT round(avg(trip_miles),2) AS avgDist,round(avg(total_amount/trip_miles),2) AS avgCostPerMile,round(avg(total_amount),2) avgCost FROM tempFullLoadCompute") + // save only new transformed data + dfNewLoad.write.mode("overwrite").format("delta").save(transformDeltaOutputPath) + //save compute KPI + dfKpiCompute.write.mode("overwrite").format("delta").save(avgDeltaOutputKPIPath) + // return incremental dataframe for next set of load + dfFullLoad + } + + // load data for each data file, use last dataframe for KPI compute with the current load + def loadData(dataFile: List[DataFile], oldDF:Option[DataFrame]):Boolean = { + if(dataFile.isEmpty) { + true + } else { + val nextDataFile = dataFile.head + val newFullDF = readTransformWriteDelta(nextDataFile.hdfsPath,oldDF) + loadData(dataFile.tail,Some(newFullDF)) + } + } + val starTime=System.currentTimeMillis() + loadData(listOfDataFile,None) + println(s"Time taken in Seconds: ${(System.currentTimeMillis()-starTime)/1000}") + ``` + :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/data-delta-format.png" alt-text="Screenshot showing how to data in delta format." border="true" lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/data-delta-format.png"::: +1. Read delta format using Delta Table + 1. read transformed data + 1. read KPI data + + ``` + import io.delta.tables._ + val dtTransformed: io.delta.tables.DeltaTable = DeltaTable.forPath(transformDeltaOutputPath) + val dtAvgKpi: io.delta.tables.DeltaTable = DeltaTable.forPath(avgDeltaOutputKPIPath) + ``` + :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/read-kpi-data.png" alt-text="Screenshot showing read KPI data." border="true" lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/read-kpi-data.png"::: +1. Print Schema + 1. Print Delta Table Schema for transformed and average KPI data1. + + ``` + // tranform data schema + dtTransformed.toDF.printSchema + // Average KPI Data Schema + dtAvgKpi.toDF.printSchema + ``` + :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/print-schema.png" alt-text="Screenshot showing print schema output."::: + +1. Display Last Computed KPI from Data Table + + `dtAvgKpi.toDF.show(false)` + + :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/computed-kpi-from-data-table.png" alt-text="Screenshot showing last computed KPI from data table."::: + +### Display Computed KPI History + +This step displays history of KPI transaction table from `_delta_log` + +`dtAvgKpi.history().show(false)` ++ +### Display KPI data after each data load + +1. Using Time travel you can view KPI changes after each load +1. You can store all version changes in CSV format at `avgMoMKPIChangePath` , so that Power BI can read these changes + +``` +val dfTxLog = spark.read.json(s"${transformDeltaOutputPath}/_delta_log/*.json") +dfTxLog.select(col("add")("path").alias("file_path")).withColumn("version",substring(input_file_name(),-6,1)).filter("file_path is not NULL").show(false) +``` ++
hdinsight-aks	Configuration Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/configuration-management.md	+ + Title: Configuration management in HDInsight on AKS Spark +description: Learn how to perform Configuration management in HDInsight on AKS Spark ++ Last updated : 08/29/2023+ +# Configuration management in HDInsight on AKS Spark ++ +Azure HDInsight on AKS is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use configuration management in Azure HDInsight on AKS Spark cluster. + +Configuration management is used to add specific configurations into the spark cluster. + +When user updates a configuration in the management portal the corresponding service is restarted in rolling manner. + +## Steps to update Configurations + +1. Click on the cluster name in the cluster pool and navigate to cluster overview page. + +1. Click on the configuration management tab in the left pane. + + :::image type="content" source="./media/configuration-management/select-configuration-tab.png" alt-text="Screenshot showing how to select configuration tab." lightbox="./media/configuration-management/select-configuration-tab.png"::: + +1. This step takes you to the Spark configurations, which are provided. + + :::image type="content" source="./media/configuration-management/show-spark-configuration.png" alt-text="Screenshot showing spark configuration." lightbox="./media/configuration-management/show-spark-configuration.png"::: + +1. Click on the configuration tabs that need to be changed. + + :::image type="content" source="./media/configuration-management/change-spark-configuration-2.png" alt-text="Screenshot showing how to change spark configuration 2." lightbox="./media/configuration-management/change-spark-configuration-2.png"::: + + :::image type="content" source="./media/configuration-management/change-spark-configuration-3.png" alt-text="Screenshot showing how to change spark configuration 3." lightbox="./media/configuration-management/change-spark-configuration-3.png"::: + +1. To change any configuration, replace the given values on the text box with the desired values, click on OK and then click Save. + + :::image type="content" source="./media/configuration-management/change-spark-configuration-value-1.png" alt-text="Screenshot showing how to change spark configuration value-1." lightbox="./media/configuration-management/change-spark-configuration-value-1.png"::: + + :::image type="content" source="./media/configuration-management/change-spark-configuration-value-2.png" alt-text="Screenshot showing how to change spark configuration value-2." lightbox="./media/configuration-management/change-spark-configuration-value-2.png"::: + + :::image type="content" source="./media/configuration-management/change-spark-configuration-value-3.png" alt-text="Screenshot showing how to change spark configuration value-3." lightbox="./media/configuration-management/change-spark-configuration-value-3.png"::: + +1. To add a new parameter, which isn't provided by default click on ΓÇ£AddΓÇ¥ in the bottom right. + + :::image type="content" source="./media/configuration-management/add-configuration.png" alt-text="Screenshot showing how to add configuration." lightbox="./media/configuration-management/add-configuration.png"::: + +1. Add the desired configuration and click on ΓÇ£OkΓÇ¥ and then click on "Save". + + :::image type="content" source="./media/configuration-management/delete-configuration.png" alt-text="Screenshot showing how to save configuration." lightbox="./media/configuration-management/delete-configuration.png"::: + +1. The configurations are updated and the cluster is restarted. +1. To delete the configurations, click on the delete icon next to the textbox. + + :::image type="content" source="./media/configuration-management/save-configuration.png" alt-text="Screenshot showing how to delete configuration." lightbox="./media/configuration-management/save-configuration.png"::: + +1. Click on ΓÇ£OkΓÇ¥ and then click on ΓÇ£SaveΓÇ¥. + + :::image type="content" source="./media/configuration-management/save-changes.png" alt-text="Screenshot showing how to save configuration changes." lightbox="./media/configuration-management/save-changes.png"::: + + > [!NOTE] + > Selecting Save will restart the clusters. + > It is advisable not to have any active jobs while making configuration changes, since restarting the cluster may impact the active jobs. + +## Next steps +* [Library management in Spark](./library-management.md)
hdinsight-aks	Connect To One Lake Storage	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/connect-to-one-lake-storage.md	+ + Title: Connect to OneLake Storage +description: Learn how to connect to OneLake storage ++ Last updated : 08/29/2023++ +# Connect to OneLake Storage ++ +This tutorial shows how to connect to OneLake with a Jupyter notebook from an Azure HDInsight on AKS cluster. + +1. Create an HDInsight on AKS Spark cluster. Follow these instructions: Set up clusters in HDInsight on AKS. +1. While providing cluster information, remember your Cluster login Username and Password, as you need them later to access the cluster. +1. Create a user assigned managed identity (UAMI): Create for Azure HDInsight on AKS - UAMI and choose it as the identity in the Storage screen. + + :::image type="content" source="./media/connect-to-one-lake-storage/basic-tab.png" alt-text="Screenshot showing cluster basic tab." lightbox="./media/connect-to-one-lake-storage/basic-tab.png"::: + +1. Give this UAMI access to the Fabric workspace that contains your items. Learn more about Fabric role-based access control (RBAC): [Workspace roles](/fabric/get-started/roles-workspaces) to decide what role is suitable. + + :::image type="content" source="./media/connect-to-one-lake-storage/manage-access.png" alt-text="Screenshot showing manage access box." lightbox="./media/connect-to-one-lake-storage/manage-access.png"::: + +1. Navigate to your Lakehouse and find the Name for your workspace and Lakehouse. You can find them in the URL of your Lakehouse or the Properties pane for a file. +1. In the Azure portal, look for your cluster and select the notebook. + :::image type="content" source="./media/connect-to-one-lake-storage/overview-page.png" alt-text="Screenshot showing cluster overview page." lightbox="./media/connect-to-one-lake-storage/overview-page.png"::: + +1. Create a new Spark Notebook. +1. Copy the workspace and Lakehouse names into your notebook and build your OneLake URL for your Lakehouse. Now you can read any file from this file path. + ``` + fp = 'abfss://' + 'Workspace Name' + '@onelake.dfs.fabric.microsoft.com/' + 'Lakehouse Name' + '/Files/' + 1df = spark.read.format("csv").option("header", "true").load(fp + "test1.csv") + 1df.show() + `````` +1. Try to write some data into the Lakehouse. + + `writecsvdf = df.write.format("csv").save(fp + "out.csv")` + +1. Test that your data was successfully written by checking in your Lakehouse or by reading your newly loaded file.
hdinsight-aks	Create Spark Cluster	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/create-spark-cluster.md	+ + Title: How to create Spark cluster in HDInsight on AKS +description: Learn how to create Spark cluster in HDInsight on AKS ++ Last updated : 08/29/2023++ +# Create Spark cluster in HDInsight on AKS (Preview) ++ +Once theΓÇ»[prerequisite](../prerequisites-resources.md)ΓÇ»steps are complete, and you have a cluster pool deployed, continue to use the Azure portal to create a Spark cluster. +You can use the Azure portal to create an Apache Spark cluster in cluster pool. You then create a Jupyter Notebook and use it to run Spark SQL queries against Apache Hive tables. + +1. In the Azure portal, type cluster pools, and select cluster pools to go to the cluster pools page. On the cluster pools page, select the cluster pool in which you can add a new Spark cluster. +1. On the specific cluster pool page, click + New cluster. + + :::image type="content" source="./media/create-spark-cluster/create-new-spark-cluster.png" alt-text="Screenshot showing how to create new spark cluster." border="true" lightbox="./media/create-spark-cluster/create-new-spark-cluster.png"::: + + This step opens the cluster create page. + + :::image type="content" source="./media/create-spark-cluster/create-cluster-basic-page-1.png" alt-text="Screenshot showing create cluster basic page 1." border="true" lightbox="./media/create-spark-cluster/create-cluster-basic-page-1.png"::: + + :::image type="content" source="./media/create-spark-cluster/create-cluster-basic-page-2.png" alt-text="Screenshot showing create cluster basic page 2." border="true" lightbox="./media/create-spark-cluster/create-cluster-basic-page-2.png"::: + + :::image type="content" source="./media/create-spark-cluster/create-cluster-basic-page-3.png" alt-text="Screenshot showing create cluster basic page 3." border="true" lightbox="./media/create-spark-cluster/create-cluster-basic-page-3.png"::: + + \|Property \| Description \| + \|-\|-\| + \|Subscription \|The Azure subscription that was registered for use with HDInsight on AKS in the Prerequisites section with be prepopulated\| + \|Resource Group \|The same resource group as the cluster pool will be pre populated \| + \|Region \|The same region as the cluster pool and virtual will be pre populated\| + \|Cluster pool \|The name of the cluster pool will be pre populated\| + \|HDInsight Pool version \|The cluster pool version will be pre populated from the pool creation selection\| + \|HDInsight on AKS version\| Specify the HDI on AKS version\| + \|Cluster type \|From the drop-down list, select Spark\| + \|Cluster Version \| Select the version of the image version to use\| + \|Cluster name \|Enter the name of the new cluster\| + \|User-assigned managed identity \|Select the user assigned managed identity which will work as a connection string with the storage\| + \|Storage Account \|Select the pre created storage account which is to be used as primary storage for the cluster\| + \|Container name \|Select the container name(unique) if pre created or create a new container\| + \|Hive Catalog (optional) \|Select the pre created Hive metastore(Azure SQL DB) \| + \|SQL Database for Hive \|From the drop-down list, select the SQL Database in which to add hive-metastore tables. \| + \|SQL admin username \|Enter the SQL admin username\| + \|Key vault \|From the drop-down list, select the Key Vault, which contains a secret with password for SQL admin username\| + \|SQL password secret name \|Enter the secret name from the Key Vault where the SQL DB password is stored\| + \|Virtual Network \|Virtual Network is prepopulated as selected during the time of cluster pool creation\| + \|Subnet\|Subnet is pre populated\| + + > [!NOTE] + > * Currently HDInsight support only MS SQL Server databases. + > * Due to Hive limitation, "-" (hyphen) character in metastore database name is not supported. + +1. Select Next: Configuration + pricing to continue. + + :::image type="content" source="./media/create-spark-cluster/create-cluster-pricing-tab-1.png" alt-text="Screenshot showing pricing tab 1." border="true" lightbox="./media/create-spark-cluster/create-cluster-pricing-tab-1.png"::: + + :::image type="content" source="./media/create-spark-cluster/create-cluster-pricing-tab-2.png" alt-text="Screenshot showing pricing tab 2." border="true" lightbox="./media/create-spark-cluster/create-cluster-pricing-tab-2.png"::: + + :::image type="content" source="./media/create-spark-cluster/create-cluster-ssh-tab.png" alt-text="Screenshot showing ssh tab." border="true" lightbox="./media/create-spark-cluster/create-cluster-ssh-tab.png"::: + + \|Property\| Description \| + \|-\|-\| + \|Node size\| Select the node size to use for the Spark nodes\| + \|Number of worker nodes\| Select the number of nodes for Spark cluster. Out of those, three nodes are reserved for coordinator and system services, remaining nodes are dedicated to Spark workers, one worker per node. For example, in a five-node cluster there are two workers\| + \|Autoscale\| Click on the toggle button to enable Autoscale\| + \|Autoscale Type \|Select from either load based or schedule based autoscale\| + \|Graceful decomission timeout \|Specify Graceful decommission timeout\| + \|No of default worker node \|Select the number of nodes for autoscale\| + \|Time Zone \|Select the time zone\| + \|Autoscale Rules \|Select the day, start time, end time, no. of worker nodes\| + \|Enable SSH \|If enabled, allows you to define Prefix and Number of SSH nodes\| +1. Click Next : Integrations to enable and select Log Analytics for Logging. + + Azure Prometheus for monitoring and metrics can be enabled post cluster creation. + + :::image type="content" source="./media/create-spark-cluster/integration-tab.png" alt-text="Screenshot showing integration tab." border="true" lightbox="./media/create-spark-cluster/integration-tab.png"::: + +1. Click Next: Tags to continue to the next page. + + :::image type="content" source="./media/create-spark-cluster/tags-tab.png" alt-text="Screenshot showing tags tab." border="true" lightbox="./media/create-spark-cluster/tags-tab.png"::: + +1. On the Tags page, enter any tags you wish to add to your resource. + + \|Property\| Description \| + \|-\|-\| + \|Name \|Optional. Enter a name such as HDInsight on AKSPrivatePreview to easily identify all resources associated with your resources\| + \|Value \|Leave this blank\| + \|Resource \|Select All resources selected\| + +1. ClickΓÇ»Next: Review + create. +1. On theΓÇ»Review + createΓÇ»page, look for the Validation succeeded message at the top of the page and then clickΓÇ»Create. +1. TheΓÇ»Deployment is in processΓÇ»page is displayed which the cluster is created. It takes 5-10 minutes to create the cluster. Once the cluster is created,ΓÇ»Your deployment is complete" message is displayed. If you navigate away from the page, you can check your Notifications for the status. +1. Go to the cluster overview page, you can see endpoint links there. + + :::image type="content" source="./media/create-spark-cluster/cluster-overview.png" alt-text="Screenshot showing cluster overview page."border="true" lightbox="./media/create-spark-cluster/cluster-overview.png":::
hdinsight-aks	Hdinsight On Aks Spark Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/hdinsight-on-aks-spark-overview.md	+ + Title: What is Apache Spark in HDInsight on AKS? (Preview) +description: An introduction to Apache Spark in HDInsight on AKS ++ Last updated : 08/29/2023++ +# What is Apache Spark in HDInsight on AKS? (Preview) ++ +Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. + +Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is faster than disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system (HDFS). Spark allows integration with the Scala and Python programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations. +++ +## HDInsight Spark in AKS +Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises. + +Apache Spark in Azure HDInsight is the managed spark service in Microsoft Azure. With Apache Spark on AKS in Azure HDInsight, you can store and process your data all within Azure. Spark clusters in HDInsight are compatible with or [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md), allows you to apply Spark processing on your existing data stores. + +The Apache Spark framework for HDInsight on AKS enables fast data analytics and cluster computing using in-memory processing. Jupyter Notebook lets you interact with your data, combine code with markdown text, and do simple visualizations. + +Spark on AKS in HDInsight composed of multiple components as pods. + +## Cluster Controllers + +Cluster controllers are responsible for installing and managing respective service. Various controllers are installed and managed in a Spark cluster. + +## Spark service components + +Zookeeper service: A three node Zookeeper cluster, serves as distributed coordinator or High Availability storage for other services. + +Yarn service: Hadoop Yarn cluster, Spark jobs would be scheduled in the cluster as Yarn applications. + +Client Interfaces: HDInsight on AKS Spark provides various client interfaces. Livy Server, Jupyter Notebook, Spark History Server, provides Spark services to HDInsight on AKS users.
hdinsight-aks	Library Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/library-management.md	+ + Title: Library Management in Azure HDInsight on AKS +description: Learn how to use Library Management in Azure HDInsight on AKS with Spark ++ Last updated : 08/29/2023++ +# Library management in Spark ++ +The purpose of Library Management is to make open-source or custom code available to notebooks and jobs running on your clusters. You can upload Python libraries from PyPI repositories. +This article focuses on managing libraries in the cluster UI. +Azure HDInsight on AKS already includes many common libraries in the cluster. To see which libraries are included in HDI on AKS cluster, review the library management page. + +## Install libraries + +You can install libraries in two modes: + +* Cluster-installed +* Notebook-scoped + +### Cluster Installed +All notebooks running on a cluster can use cluster libraries. You can install a cluster library directly from a public repository such as PyPi. Upload from Maven repositories, upload custom libraries from cloud storage are in the roadmap. ++ +### Notebook-scoped + +Notebook-scoped libraries, available for Python and Scala, which allow you to install libraries and create an environment scoped to a notebook session. These libraries don't affect other notebooks running on the same cluster. Notebook-scoped libraries don't persist and must be reinstalled for each session. + +> [!NOTE] +> Use notebook-scoped libraries when you need a custom environment for a specific notebook. + +#### Modes of Library Installation + +PyPI: Fetch libraries from open source PyPI repository by mentioning the library name and version in the installation UI. + +## View the installed libraries + +1. From Overview page, navigate to Library Manager. + + :::image type="content" source="./media/library-management/library-manager.png" alt-text="Screenshot showing library manager page." lightbox="./media/library-management/library-manager.png"::: + +1. From Spark Cluster Manager, click on Library Manager. +1. You can view the list of installed libraries from here. + + :::image type="content" source="./media/library-management/view-installed-libraries.png" alt-text="Screenshot showing how to view installed libraries." lightbox="./media/library-management/view-installed-libraries.png"::: + +## Add library widget + +#### PyPI + +1. From the PyPI tab, enter the Package Name and Package Version.. +1. Click Install. + + :::image type="content" source="./media/library-management/install-pypi.png" alt-text="Screenshot showing how to install PyPI."::: + +## Uninstalling Libraries + +If you decide not to use the libraries anymore, then you can easily delete the libraries packages through the uninstall button in the library management page. + +1. Select and click on the library name + + :::image type="content" source="./media/library-management/select-library.png" alt-text="Screenshot showing how to select library."::: + +1. Click on Uninstall in the widget + + :::image type="content" source="./media/library-management/uninstall-library.png" alt-text="Screenshot showing how to uninstall library."::: + + > [!NOTE] + > * Packages installed from Jupyter notebook can only be deleted from Jupyter Notebook. + > * Packages installed from library manager can only be uninstalled from library manager. + > * For upgrading a library/package, uninstall the current version of the library and resinstall the required version of the library. + > * Installation of libraries from Jupyter notebook is particular to the session. It is not persistant. + > * Installing heavy packages may take some time due to their size and complexity.
hdinsight-aks	Submit Manage Jobs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/submit-manage-jobs.md	+ + Title: How to submit and manage jobs on a Spark cluster in Azure HDInsight on AKS +description: Learn how to submit and manage jobs on a Spark cluster in HDInsight on AKS ++ Last updated : 08/29/2023++ +# Submit and manage jobs on a Spark cluster in HDInsight on AKS ++ +Once the cluster is created, user can use various interfaces to submit and manage jobs by + +* using Jupyter +* using Zeppelin +* using ssh (spark-submit) + +## Using Jupyter + +### Prerequisites +An Apache Spark cluster on HDInsight on AKS. For more information, seeΓÇ»[Create an Apache Spark cluster](./create-spark-cluster.md). + +Jupyter Notebook is an interactive notebook environment that supports various programming languages. + +### Create a Jupyter Notebook + +1. Navigate to the Spark cluster page and open the Overview tab. Click on Jupyter, it asks you to authenticate and open the Jupyter web page. + + :::image type="content" source="./media/submit-manage-jobs/select-jupyter-notebook.png" alt-text="Screenshot of how to select Jupyter notebook." border="true" lightbox="./media/submit-manage-jobs/select-jupyter-notebook.png"::: + +1. From the Jupyter web page, Select New > PySpark to create a notebook. + + :::image type="content" source="./media/submit-manage-jobs/select-pyspark.png" alt-text="Screenshot of new PySpark page." border="true" lightbox="./media/submit-manage-jobs/select-pyspark.png"::: + + A new notebook created and opened with the nameΓÇ»`Untitled(Untitled.ipynb)`. + + > [!NOTE] + > By using the PySpark or the Python 3 kernel to create a notebook, the spark session is automatically created for you when you run the first code cell. You do not need to explicitly create the session. + +1. Paste the following code in an empty cell of the Jupyter Notebook, and then press SHIFT + ENTER to run the code. See [here](https://docs.jupyter.org/en/latest/) for more controls on Jupyter. + + :::image type="content" source="./media/submit-manage-jobs/pyspark-page.png" alt-text="Screenshot of PySpark page with contents." lightbox="./media/submit-manage-jobs/pyspark-page.png"::: + + ``` + %matplotlib inline + import pandas as pd + import matplotlib.pyplot as plt + data1 = [22,40,10,50,70] + s1 = pd.Series(data1) #One-dimensional ndarray with axis labels (including time series). + + data2 = data1 + index = ['John','sam','anna','smith','ben'] + s2 = pd.Series(data2,index=index) + + data3 = {'John':22, 'sam':40, 'anna':10,'smith':50,'ben':70} + s3 = pd.Series(data3) + + s3['jp'] = 32 #insert a new row + s3['John'] = 88 + + names = ['John','sam','anna','smith','ben'] + ages = [10,40,50,48,70] + name_series = pd.Series(names) + age_series = pd.Series(ages) + + data_dict = {'name':name_series, 'age':age_series} + dframe = pd.DataFrame(data_dict) + #create a pandas DataFrame from dictionary + + dframe['age_plus_five'] = dframe['age'] + 5 + #create a new column + dframe.pop('age_plus_five') + #dframe.pop('age') + + salary = [1000,6000,4000,8000,10000] + salary_series = pd.Series(salary) + new_data_dict = {'name':name_series, 'age':age_series,'salary':salary_series} + new_dframe = pd.DataFrame(new_data_dict) + new_dframe['average_salary'] = new_dframe['age']90 + + new_dframe.index = new_dframe['name'] + print(new_dframe.loc['sam']) + ``` + +1. Plot a graph with Salary and age as the X and Y axes +1. In the same notebook, paste the following code in an empty cell of the Jupyter Notebook, and then press SHIFT + ENTER* to run the code. + + ``` + %matplotlib inline + import pandas as pd + import matplotlib.pyplot as plt + + plt.plot(age_series,salary_series) + plt.show() + ``` + + :::image type="content" source="./media/submit-manage-jobs/graph-output.png" alt-text="Screenshot of graph output." lightbox="./media/submit-manage-jobs/graph-output.png"::: + +### Save the Notebook + +1. From the notebook menu bar, navigate to File > Save and Checkpoint. +1. Shut down the notebook to release the cluster resources: from the notebook menu bar, navigate to File > Close and Halt. You can also run any of the notebooks under the examples folder. + + :::image type="content" source="./media/submit-manage-jobs/save-note-books.png" alt-text="Screenshot of how to save the note books." lightbox="./media/submit-manage-jobs/save-note-books.png"::: + +## Using Apache Zeppelin notebooks + +HDInsight on AKS Spark clusters includeΓÇ»[Apache Zeppelin notebooks](https://zeppelin.apache.org/). Use the notebooks to run Apache Spark jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight on AKS cluster. +### Prerequisites +An Apache Spark cluster on HDInsight on AKS. For instructions, seeΓÇ»[Create an Apache Spark cluster](./create-spark-cluster.md). + +#### Launch an Apache Zeppelin notebook + +1. Navigate to the Spark clusterΓÇ»Overview pageΓÇ»and selectΓÇ»Zeppelin notebookΓÇ»fromΓÇ»Cluster dashboards. It prompts to authenticate and open the Zeppelin page. + + :::image type="content" source="./media/submit-manage-jobs/select-zeppelin.png" alt-text="Screenshot of how to select Zeppelin." lightbox="./media/submit-manage-jobs/select-zeppelin.png"::: + +1. Create a new notebook. From the header pane, navigate toΓÇ»Notebook > Create new note. Ensure the notebook header shows a connected status. It denotes a green dot in the top-right corner. + + :::image type="content" source="./media/submit-manage-jobs/create-zeppelin-notebook.png" alt-text="Screenshot of how to create zeppelin notebook." lightbox="./media/submit-manage-jobs/create-zeppelin-notebook.png"::: + +1. Run the following code in Zeppelin Notebook: + + ``` + %livy.pyspark + import pandas as pd + import matplotlib.pyplot as plt + data1 = [22,40,10,50,70] + s1 = pd.Series(data1) #One-dimensional ndarray with axis labels (including time series). + + data2 = data1 + index = ['John','sam','anna','smith','ben'] + s2 = pd.Series(data2,index=index) + + data3 = {'John':22, 'sam':40, 'anna':10,'smith':50,'ben':70} + s3 = pd.Series(data3) + + s3['jp'] = 32 #insert a new row + + s3['John'] = 88 + + names = ['John','sam','anna','smith','ben'] + ages = [10,40,50,48,70] + name_series = pd.Series(names) + age_series = pd.Series(ages) + + data_dict = {'name':name_series, 'age':age_series} + dframe = pd.DataFrame(data_dict) #create a pandas DataFrame from dictionary + + dframe['age_plus_five'] = dframe['age'] + 5 #create a new column + dframe.pop('age_plus_five') + #dframe.pop('age') + + salary = [1000,6000,4000,8000,10000] + salary_series = pd.Series(salary) + new_data_dict = {'name':name_series, 'age':age_series,'salary':salary_series} + new_dframe = pd.DataFrame(new_data_dict) + new_dframe['average_salary'] = new_dframe['age']90 + + new_dframe.index = new_dframe['name'] + print(new_dframe.loc['sam']) + ``` +1. Select the Play* button for the paragraph to run the snippet. The status on the right-corner of the paragraph should progress from READY, PENDING, RUNNING to FINISHED. The output shows up at the bottom of the same paragraph. The screenshot looks like the following image: + + :::image type="content" source="./media/submit-manage-jobs/run-zeppelin-notebook.png" alt-text="Screenshot of how to run Zeppelin notebook." lightbox="./media/submit-manage-jobs/run-zeppelin-notebook.png"::: + + Output: + + :::image type="content" source="./media/submit-manage-jobs/zeppelin-notebook-output.png" alt-text="Screenshot of Zeppelin notebook output." lightbox="./media/submit-manage-jobs/zeppelin-notebook-output.png"::: + +## Using Spark submit jobs + +1. Create a file using the following command `#vim samplefile.py' +1. This command opens the vim file +1. Paste the following code into the vim file + ``` + import pandas as pd + import matplotlib.pyplot as plt + + From pyspark.sql import SparkSession + Spark = SparkSession.builder.master('yarn').appName('SparkSampleCode').getOrCreate() + // Initialize spark context + data1 = [22,40,10,50,70] + s1 = pd.Series(data1) #One-dimensional ndarray with axis labels (including time series). + + data2 = data1 + index = ['John','sam','anna','smith','ben'] + s2 = pd.Series(data2,index=index) + + data3 = {'John':22, 'sam':40, 'anna':10,'smith':50,'ben':70} + s3 = pd.Series(data3) + + s3['jp'] = 32 #insert a new row + + s3['John'] = 88 + + names = ['John','sam','anna','smith','ben'] + ages = [10,40,50,48,70] + name_series = pd.Series(names) + age_series = pd.Series(ages) + + data_dict = {'name':name_series, 'age':age_series} + dframe = pd.DataFrame(data_dict) #create a pandas DataFrame from dictionary + + dframe['age_plus_five'] = dframe['age'] + 5 #create a new column + dframe.pop('age_plus_five') + #dframe.pop('age') + + salary = [1000,6000,4000,8000,10000] + salary_series = pd.Series(salary) + new_data_dict = {'name':name_series, 'age':age_series,'salary':salary_series} + new_dframe = pd.DataFrame(new_data_dict) + new_dframe['average_salary'] = new_dframe['age']90 + + new_dframe.index = new_dframe['name'] + print(new_dframe.loc['sam']) + ``` + +1. Save the file with the following method. + 1. Press Escape button + 1. Enter the command `:wq` + +1. Run the following command to run the job. + + `/spark-submit --master yarn --deploy-mode cluster <filepath>/samplefile.py` + + :::image type="content" source="./media/submit-manage-jobs/run-spark-submit-job.png" alt-text="Screenshot showing how to run Spark submit job." lightbox="./media/submit-manage-jobs/view-vim-file.png"::: + +## Monitor queries on a Spark cluster in HDInsight on AKS + +#### Spark History UI + +1. Click on the Spark History Server UI from the overview Tab. + + :::image type="content" source="./media/submit-manage-jobs/select-spark-ui.png" alt-text="Screenshot showing Spark UI." lightbox="./media/submit-manage-jobs/select-spark-ui.png"::: + +1. Select the recent run from the UI using the same application ID. + + :::image type="content" source="./media/submit-manage-jobs/run-spark-ui.png" alt-text="Screenshot showing how to run Spark UI." lightbox="./media/submit-manage-jobs/run-spark-ui.png"::: + +1. View the Directed Acyclic Graph cycles and the stages of the job in the Spark History server UI. + + :::image type="content" source="./media/submit-manage-jobs/view-dag-cycle.png" alt-text="Screenshot of DAG cycle." lightbox="./media/submit-manage-jobs/view-dag-cycle.png"::: + +Livy session UI* + +1. To open the Livy session UI, type the following command into your browser + `https://<CLUSTERNAME>.<CLUSTERPOOLNAME>.<REGION>.projecthilo.net/p/livy/ui ` + + :::image type="content" source="./media/submit-manage-jobs/open-livy-session-ui.png" alt-text="Screenshot of how to open Livy session UI." lightbox="./media/submit-manage-jobs/open-livy-session-ui.png"::: + +1. View the driver logs by clicking on the driver option under logs. + +Yarn UI + +1. From the Overview Tab click on Yarn and, open the Yarn UI. + + :::image type="content" source="./media/submit-manage-jobs/select-yarn-ui.png" alt-text="Screenshot of how to select Yarn UI." lightbox="./media/submit-manage-jobs/select-yarn-ui.png"::: + +1. You can track the job you recently ran by the same application ID. + +1. Click on the Application ID in Yarn to view detailed logs of the job. + + :::image type="content" source="./media/submit-manage-jobs/view-logs.png" alt-text="View Logs." lightbox="./media/submit-manage-jobs/view-logs.png"::: ++
hdinsight-aks	Use Hive Metastore	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/use-hive-metastore.md	+ + Title: How to use Hive metastore in Spark +description: Learn how to use Hive metastore in Spark ++ Last updated : 08/29/2023++ +# How to use Hive metastore in Spark ++ +It's essential to share the data and metastore across multiple services. One of the commonly used metastore in HIVE metastore. HDInsight on AKS allows users to connect to external metastore. This step enables the HDInsight users to seamlessly connect to other services in the ecosystem. + +Azure HDInsight on AKS supports custom meta stores, which are recommended for production clusters. The key steps involved are + +1. Create Azure SQL database +1. Create a key vault for storing the credentials +1. Configure Metastore while you create a HDInsight Spark cluster +1. Operate on External Metastore (Shows databases and do a select limit 1). + +While you create the cluster, HDInsight service needs to connect to the external metastore and verify your credentials. + +## Create Azure SQL database + +1. Create or have an existing Azure SQL Database before setting up a custom Hive metastore for an HDInsight cluster. + + > [!NOTE] + > Currently, we support only Azure SQL Database for HIVE metastore. + > Due to Hive limitation, "-" (hyphen) character in metastore database name is not supported. + +## Create a key vault for storing the credentials + +1. Create an Azure Key Vault. + + The purpose of the Key Vault is to allow you to store the SQL Server admin password set during SQL database creation. HDInsight on AKS platform doesnΓÇÖt deal with the credential directly. Hence, it's necessary to store your important credentials in Azure Key Vault. + Learn the steps to create an [Azure Key Vault](../../key-vault/general/quick-create-portal.md). +1. Post the creation of Azure Key Vault assign the following roles + + \|Object \|Role\|Remarks\| + \|-\|-\|-\| + \|User Assigned Managed Identity(the same UAMI as used by the HDInsight cluster) \|Key Vault Secrets User \| Learn how to [Assign role to UAMI](../../active-directory/managed-identities-azure-resources/howto-assign-access-portal.md)\| + \|User(who creates secret in Azure Key Vault) \| Key Vault Administrator\| Learn how to [Assign role to user](../../role-based-access-control/role-assignments-portal.md#step-2-open-the-add-role-assignment-page). \| + + > [!NOTE] + > Without this role, user can't create a secret. + +1. [Create a secret](../../key-vault/secrets/quick-create-portal.md#add-a-secret-to-key-vault) + + This step allows you to keep your SQL server admin password as a secret in Azure Key Vault. Add your password(same password as provided in the SQL DB for admin) in the ΓÇ£ValueΓÇ¥ field while adding a secret. + + :::image type="content" source="./media/use-hive-metastore/key-vault.png" alt-text="Screenshot showing how to create a key vault." lightbox="./media/use-hive-metastore/key-vault.png"::: + + :::image type="content" source="./media/use-hive-metastore/create-secret.png" alt-text="Screenshot showing how to create a secret." lightbox="./media/use-hive-metastore/create-secret.png"::: ++ + > [!NOTE] + > Make sure to note the secret name, as you'll need this during cluster creation. + + +## Configure Metastore while you create a HDInsight Spark cluster + +1. Navigate to HDInsight on AKS Cluster pool to create clusters. + + :::image type="content" source="./media/use-hive-metastore/create-new-cluster.png" alt-text="Screenshot showing how to create new cluster." lightbox="./media/use-hive-metastore/create-new-cluster.png"::: + +1. Enable the toggle button to add external hive metastore and fill in the following details. + + :::image type="content" source="./media/use-hive-metastore/basic-tab.png" alt-text="Screenshot showing the basic tab." lightbox="./media/use-hive-metastore/basic-tab.png"::: + +1. The rest of the details are to be filled in as per the cluster creation rules for [HDInsight on AKS Spark cluster](./create-spark-cluster.md). + +1. Click on Review and Create. + + :::image type="content" source="./media/use-hive-metastore/review-create-tab.png" alt-text="Screenshot showing the review and create tab." lightbox="./media/use-hive-metastore/review-create-tab.png"::: + + > [!NOTE] + > * The lifecycle of the metastore isn't tied to a clusters lifecycle, so you can create and delete clusters without losing metadata. Metadata such as your Hive schemas persist even after you delete and re-create the HDInsight cluster. + > * A custom metastore lets you attach multiple clusters and cluster types to that metastore. + +## Operate on External Metastore + +1. Create a table + + `>> spark.sql("CREATE TABLE sampleTable (number Int, word String)")` + + :::image type="content" source="./media/use-hive-metastore/create-table.png" alt-text="Screenshot showing how to create table." lightbox="./media/use-hive-metastore/create-table.png"::: + +1. Add data on the table + + `>> spark.sql("INSERT INTO sampleTable VALUES (123, \"HDIonAKS\")");\` + + :::image type="content" source="./media/use-hive-metastore/insert-statement.png" alt-text="Screenshot showing insert statement." lightbox="./media/use-hive-metastore/insert-statement.png"::: + +1. Read the table + + `>> spark.sql("select * from sampleTable").show()` + + :::image type="content" source="./media/use-hive-metastore/read-table.png" alt-text="Screenshot showing how to read table." lightbox="./media/use-hive-metastore/read-table.png"::: ++
hdinsight-aks	Use Machine Learning Notebook On Spark	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/spark/use-machine-learning-notebook-on-spark.md	+ + Title: How to use Azure Machine Learning Notebook on Spark +description: Learn how to Azure Machine Learning notebook on Spark ++ Last updated : 08/29/2023++ +# How to use Azure Machine Learning Notebook on Spark ++ +Machine learning is a growing technology, which enables computers to learn automatically from past data. Machine learning uses various algorithms for building mathematical models and making predictions use historical data or information. We have a model defined up to some parameters, and learning is the execution of a computer program to optimize the parameters of the model using the training data or experience. The model may be predictive to make predictions in the future, or descriptive to gain knowledge from data. + +The following tutorial notebook shows an example of training machine learning models on tabular data. You can import this notebook and run it yourself. + +## Upload the CSV into your storage + +1. Find your storage and container name in the portal JSON view + + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/json-view.png" alt-text="Screenshot showing JSON view." lightbox="./media/use-machine-learning-notebook-on-spark/json-view.png"::: + + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/resource-json.png" alt-text="Screenshot showing resource JSON view." lightbox="./media/use-machine-learning-notebook-on-spark/resource-json.png"::: + +1. Navigate into your primary HDI storage>container>base folder> upload the [CSV](https://github.com/Azure-Samples/hdinsight-aks/blob/main/spark/iris_csv.csv) + + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/navigate-to-storage-container.png" alt-text="Screenshot showing how to navigate to storage and container." lightbox="./media/use-machine-learning-notebook-on-spark/navigate-to-storage-container.png"::: + + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/upload-csv.png" alt-text="Screenshot showing how to upload CSV file." lightbox="./media/use-machine-learning-notebook-on-spark/upload-csv.png"::: + +1. Log in to your cluster and open the Jupyter Notebook + + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/jupyter-notebook.png" alt-text="Screenshot showing Jupyter Notebook." lightbox="./media/use-machine-learning-notebook-on-spark/jupyter-notebook.png"::: + +1. Import Spark MLlib Libraries to create the pipeline + ``` + import pyspark + from pyspark.ml import Pipeline, PipelineModel + from pyspark.ml.classification import LogisticRegression + from pyspark.ml.feature import VectorAssembler, StringIndexer, IndexToString + ``` + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/start-spark-application.png" alt-text="Screenshot showing how to start spark application."::: ++ +1. Read the CSV into a Spark dataframe + + `df = spark.read.("abfss:///iris_csv.csv",inferSchema=True,header=True)` +1. Split the data for training and testing + + `iris_train, iris_test = df.randomSplit([0.7, 0.3], seed=123)` + +1. Create the pipeline and train the model + + ``` + assembler = VectorAssembler(inputCols=['sepallength', 'sepalwidth', 'petallength', 'petalwidth'],outputCol="features",handleInvalid="skip") + indexer = StringIndexer(inputCol="class", outputCol="classIndex", handleInvalid="skip") + classifier = LogisticRegression(featuresCol="features", + labelCol="classIndex", + maxIter=10, + regParam=0.01) + + pipeline = Pipeline(stages=[assembler,indexer,classifier]) + model = pipeline.fit(iris_train) + + # Create a test `dataframe` with predictions from the trained model + + test_model = model.transform(iris_test) + + # Taking an output from the test dataframe with predictions + + test_model.take(1) + ``` + + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/test-model.png" alt-text="Screenshot showing how to run the test model."::: + +1. Evaluate the model accuracy + + ``` + import pyspark.ml.evaluation as ev + evaluator = ev.MulticlassClassificationEvaluator(labelCol='classIndex') + + print(evaluator.evaluate(test_model,{evaluator.metricName: 'accuracy'})) + ``` + :::image type="content" source="./media/use-machine-learning-notebook-on-spark/print-output.png" alt-text="Screenshot showing how to print output."::: +
hdinsight-aks	Configure Azure Active Directory Login For Superset	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/configure-azure-active-directory-login-for-superset.md	+ + Title: Configure Azure Active Directory OAuth2 login for Apache Superset +description: Learn how to configure Azure Active Directory OAuth2 login for Superset ++ Last updated : 08/29/2023++ +# Configure Azure Active Directory OAuth2 login ++ +This article describes how to allow users to use their Azure Active Directory (Azure AD) account ("Microsoft work or school account") to log in to Apache Superset. + +The following configuration allows users to have Superset accounts automatically created when they use their Azure AD login. Azure groups can be automatically mapped to Superset roles, which allow control over who can access Superset and what permissions are given. + +1. Create an Azure Active Directory service principal. The steps to create Azure Active Directory are described [here](/azure/active-directory/develop/howto-create-service-principal-portal). + + For testing, set the redirect URL to: `http://localhost:8088/oauth-authorized/azure` + +1. Create the following secrets in a key vault. + + \|Secret name\|Description\| + \|-\|-\| + \|client-secret\|Service principal/application secret used for user login.\| + +1. Allow your AKS managed identity (`$MANAGED_IDENTITY_RESOURCE`) to [get and list secrets from the Key Vault](/azure/key-vault/general/assign-access-policy?tabs=azure-portal). + +1. Enable the Key Vault secret provider login for your cluster. For more information, see [here](/azure/aks/csi-secrets-store-driver#upgrade-an-existing-aks-cluster-with-azure-key-vault-provider-for-secrets-store-csi-driver-support). + + ```bash + az aks enable-addons --addons azure-keyvault-secrets-provider --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP_NAME + ``` + +1. Prepare a secret provider to allow your service principal secret to be accessible from the Superset machines. For more information, see [here](/azure/aks/csi-secrets-store-identity-access). + + Update in the yaml: + * `{{MSI_CLIENT_ID}}` - The client ID of the managed identity assigned to the Superset cluster (`$MANAGED_IDENTITY_RESOURCE`). + * `{{KEY_VAULT_NAME}}` - The name of the Azure Key Vault containing the secrets. + * `{{KEY_VAULT_TENANT_ID}}` - The identifier guid of the Azure tenant where the key vault is located. + + superset_secretproviderclass.yaml: + + ```yaml + # This is a SecretProviderClass example using aad-pod-identity to access the key vault + apiVersion: secrets-store.csi.x-k8s.io/v1 + kind: SecretProviderClass + metadata: + name: azure-secret-provider + spec: + provider: azure + parameters: + useVMManagedIdentity: "true" + userAssignedIdentityID: "{{MSI_CLIENT_ID}}" + usePodIdentity: "false" # Set to true for using aad-pod-identity to access your key vault + keyvaultName: "{{KEY_VAULT_NAME}}" # Set to the name of your key vault + cloudName: "" # [OPTIONAL for Azure] if not provided, the Azure environment defaults to AzurePublicCloud + objects: \| + array: + - \| + objectName: client-secret + objectType: secret + tenantId: "{{KEY_VAULT_TENANT_ID}}" # The tenant ID of the key vault + secretObjects: + - secretName: azure-kv-secrets + type: Opaque + data: + - key: AZURE_SECRET + objectName: client-secret + ``` + +1. Apply the SecretProviderClass to your cluster. + + ```bash + kubectl apply -f superset_secretproviderclass.yaml + ``` + +1. Add to your Superset configuration. + + Update in the config: + + * `{{AZURE_TENANT}}` - The tenant the users log into. + * `{{SERVICE_PRINCIPAL_APPLICATION_ID}}` - The client/application ID of the service principal you created in step 1. + * `{{VALID_DOMAINS}}` - An allowlist of domains for user email addresses. + + Refer to [sample code](https://github.com/Azure-Samples/hdinsight-aks/blob/main/src/trino/superset-config.yml). + + +## Redeploy Superset + + ```bash + helm repo update + helm upgrade --install --values values.yaml superset superset/superset + ``` + +## Next Steps + +* [Role Based Access Control](./role-based-access-control.md)
hdinsight-aks	Configure Ingress	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/configure-ingress.md	+ + Title: Expose Superset to the internet +description: Learn how to expose Superset to the internet ++ Last updated : 08/29/2023++ +# Expose Apache Superset to Internet ++ +This article describes how to expose Apache Superset to the Internet. + +## Configure ingress + +The following instructions add a second layer of authentication in the form of an Oauth authorization proxy using Oauth2Proxy. It adds more layer of preventative security to your Superset access. + +1. Get a TLS certificate for your hostname and place it into your Key Vault and call it `aks-ingress-tls`. Learn how to put a certificate into an [Azure Key Vault](/azure/key-vault/certificates/certificate-scenarios). + +1. Add the following secrets to your Key Vault. + + \|Secret Name\|Description\| + \|-\|-\| + \|client-id\|Your Azure service principal client ID. Oauth proxy requires this ID to be a secret.\| + \|oauth2proxy-redis-password\|Proxy cache password. The password used by the Oauth proxy to access the back end Redis deployment instance on Kubernetes. Generate a strong password.\| + \|oauth2proxy-cookie-secret\|Cookie secret, used to encrypt the cookie data. This cookie secret must be 32 characters long.\| +1. Add these callbacks in your Azure AD application configuration. + + * `https://{{YOUR_HOST_NAME}}/oauth2/callback` + - for Oauth2 Proxy + * `https://{{YOUR_HOST_NAME}}/oauth-authorized/azure` + - for Superset + +1. Deploy the basic ingress ngninx controller into the `default` namespace. For more information, see [here](/azure/aks/ingress-basic?tabs=azure-cli#basic-configuration). + +> [!NOTE] +> The ingress nginx controller steps use Kubernetes namespace `ingress-basic`. Please modify to use the `default` namespace. e.g. `NAMESPACE=default` + +1. Create TLS secret provider class. + + This step describes how the TLS certificate is read from the Key Vault and transformed into a Kubernetes secret to be used by ingress: + + Update in the following yaml: + * `{{MSI_CLIENT_ID}}` - The client ID of the managed identity assigned to the Superset cluster (`$MANAGED_IDENTITY_RESOURCE`). + * `{{KEY_VAULT_NAME}}` - The name of the Azure Key Vault containing the secrets. + * `{{KEY_VAULT_TENANT_ID}}` - The identifier guid of the Azure tenant where the Key Vault is located. + + tls-secretprovider.yaml + ```yaml + apiVersion: secrets-store.csi.x-k8s.io/v1 + kind: SecretProviderClass + metadata: + name: azure-tls + spec: + provider: azure + # secretObjects defines the desired state of synced K8s secret objects + secretObjects: + - secretName: ingress-tls-csi + type: kubernetes.io/tls + data: + - objectName: aks-ingress-tls + key: tls.key + - objectName: aks-ingress-tls + key: tls.crt + parameters: + usePodIdentity: "false" + useVMManagedIdentity: "true" + userAssignedIdentityID: "{{MSI_CLIENT_ID}}" + # the name of the AKV instance + keyvaultName: "{{KEY_VAULT_NAME}}" + objects: \| + array: + - \| + objectName: aks-ingress-tls + objectType: secret + # the tenant ID of the AKV instance + tenantId: "{{KEY_VAULT_TENANT_ID}}" + ``` + +1. Create OauthProxy Secret Provider Class. + + Update in the following yaml: + * `{{MSI_CLIENT_ID}}` - The client ID of the managed identity assigned to the Superset cluster (`$MANAGED_IDENTITY_RESOURCE`). + * `{{KEY_VAULT_NAME}}` - The name of the Azure Key Vault containing the secrets. + * `{{KEY_VAULT_TENANT_ID}}` - The identifier guid of the Azure tenant where the key vault is located. + + Refer to [sample code](https://github.com/Azure-Samples/hdinsight-aks/blob/main/src/trino/oauth2-secretprovider.yml). + + +1. Create configuration for the Oauth Proxy. + + Update in the following yaml: + * `{{hostname}}` - The internet facing host name. + * `{{tenant-id}}` - The identifier guid of the Azure tenant where your service principal was created. + + Optional: update the email_domains list. Example: `email_domains = [ "microsoft.com" ]` + + Refer to [sample code](https://github.com/Azure-Samples/hdinsight-aks/blob/main/src/trino/oauth2-values.yml). + +1. Deploy Oauth proxy resources. + + ```bash + kubectl apply -f oauth2-secretprovider.yaml + kubectl apply -f tls-secretprovider.yaml + helm repo add oauth2-proxy https://oauth2-proxy.github.io/manifests + helm repo update + helm upgrade --install --values oauth2-values.yaml oauth2 oauth2-proxy/oauth2-proxy + ``` + +1. If you're using an Azure subdomain that is, `superset.<region>.cloudapp.azure.com`, update the DNS label in the associated public IP. + + 1. Open your Superset Kubernetes cluster in the Azure portal. + + 1. Select "Properties" from the left navigation. + + 1. Open the "Infrastructure resource group" link. + + 1. Find the Public IP address with these tags: + + ```json + { + "k8s-azure-cluster-name": "kubernetes", + "k8s-azure-service": "default/ingress-nginx-controller" + } + ``` + 1. Select "Configuration" from the Public IP left navigation. + + 1. Enter the DNS name label. + +1. Verify that your ingress for Oauth is configured. + + Run `kubectl get ingress` to see the ingresses created. You should see an `EXTERNAL-IP` associated with the ingress. + Likewise, when running `kubectl get services` you should see that `ingress-nginx-controller` has been assigned an `EXTERNAL-IP`. + You can open `http://<hostname>/oauth2` to test Oauth. + +1. Define an ingress to link Oauth and Superset. This step causes any calls to any path to be first redirected to /oauth for authorization, and upon success, allowed to access the Superset service. + + Update in the following yaml: + * `{{hostname}}` - The internet facing host name. + + ingress.yaml + ```yaml + apiVersion: networking.k8s.io/v1 + kind: Ingress + metadata: + annotations: + kubernetes.io/ingress.class: nginx + nginx.ingress.kubernetes.io/auth-signin: https://$host/oauth2/start?rd=$escaped_request_uri + nginx.ingress.kubernetes.io/auth-url: http://oauth2-oauth2-proxy.default.svc.cluster.local:80/oauth2/auth + nginx.ingress.kubernetes.io/proxy-buffer-size: 64k + nginx.ingress.kubernetes.io/proxy-buffers-number: "8" + nginx.ingress.kubernetes.io/rewrite-target: /$1 + generation: 1 + labels: + app.kubernetes.io/name: azure-trino-superset + name: azure-trino-superset + namespace: default + spec: + rules: + - host: "{{hostname}}" + http: + paths: + - backend: + service: + name: superset + port: + number: 8088 + path: /(.) + pathType: Prefix + tls: + - hosts: + - "{{hostname}}" + secretName: ingress-tls-csi + ``` + +1. Deploy your ingress. + + ``` + kubectl apply -f ingress.yaml + ``` + +1. Test. + + Open `https://{{hostname}}/` in your browser. + +### Troubleshooting ingress + +> [!TIP] +> To reset the ingress deployment, execute the following commands: +> ```bash +> kubectl delete secret ingress-tls-csi +> kubectl delete secret oauth2-secret +> helm uninstall oauth2-proxy +> helm uninstall ingress-nginx +> kubectl delete ingress azure-trino-superset +> ``` +> After deleting the resources, you need to restart these instructions from the beginning. + +#### Invalid security certificate: Kubernetes ingress controller fake certificate + +This issue causes a TLS certificate verification error in your browser/client. To see this error, inspect the certificate in a browser. + +The usual cause of this issue is that your certificate is misconfigured: + + Verify that you can see your certificate in Kubernetes: `kubectl get secret ingress-tls-csi --output yaml` + +* Verify that your CN matches the CN provided in your certificate. + + * The CN mismatch gets logged in the ingress pod. These logs can be seen by running `kubectl logs <ingress pod>`. The following error appears in the logs: + + `SSL certificate "default/ingress-tls-csi" does not contain a Common Name or Subject Alternative Name for server "{server name}": x509: certificate is valid for {invalid hostname}, not {actual hostname}` + +#### 404 / nginx + +Nginx can't find the underlying service. Make sure Superset is deployed: `kubectl get services` + +#### 503 Service temporarily unavailable / nginx + +The service is running but inaccessible. Check the service port numbers and service name.
hdinsight-aks	Role Based Access Control	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/role-based-access-control.md	+ + Title: Configure Role Based Access Control +description: How to provide Role Based Access Control ++ Last updated : 08/29/2023++ +# Configure Role Based Access Control ++ +This article describes how to provide Role Based Access Control and auto assign users to Apache Superset roles. This Role Based Access Control enables you to manage user groups in Azure Active Directory but configure access permissions in Superset. +For example, if you have a security group called `datateam`, you can propagate membership of this group to Superset, which means Superset can automatically deny access if a user is removed from this security group. + +1. Create a role that forbids access to Superset. + + Create a `NoAccess` role in Superset that prevents users from running queries or performing any operations. + This role is the default role that users get assigned to if they don't belong to any other group. + + 1. In Superset, select "Settings" (on the top right) and choose "List Roles." + + 1. Select the plus symbol to add a new role. + + 1. Give the following details for the new role. + Name: NoAccess + Permissions: `[can this form get on UserInfoEditView]` + + 1. Select ΓÇ£Save.ΓÇ¥ + +1. Configure Superset to automatically assign roles. + + Replace the `automatic registration of users` section in the Helm chart with the following example: + + ```yaml + # ** Automatic registration of users + # Map Authlib roles to superset roles + # Will allow user self-registration, allowing to create Flask users from Authorized User + AUTH_USER_REGISTRATION = True + # The default user self-registration role + AUTH_USER_REGISTRATION_ROLE = "NoAccess" + AUTH_ROLES_SYNC_AT_LOGIN = True + # The role names here are the roles that are auto created by Superset. + # You may have different requirements. + AUTH_ROLES_MAPPING = { + "Alpha": ["Admin"], + "Public": ["Public"], + "Alpha": ["Alpha"], + "Gamma": ["Gamma"], + "granter": ["granter"], + "sqllab": ["sql_lab"], + } + # ** End automatic registration of users + ``` + +## Redeploy Superset + +```bash +helm repo update +helm upgrade --install --values values.yaml superset superset/superset +``` + +1. Modify Azure Active Directory App Registration. + + Search for your application in Azure Active Directory and select your app under the "app registration" heading. + Edit your app registration's roles by selecting "App roles" from the left navigation, and add all of the Superset roles you would like to use. It's recommended you add at least the Admin and Public roles. + + \|Value\|Display Name\|Description\|Allowed Member Types\| + \|-\|-\|-\|-\| + \|Admin\|Admin\|Superset administrator\|Users/Groups\| + \|Public\|Public\|Superset user\|Users/Groups\| + + Example: + + :::image type="content" source="./media/role-based-access-control/role-assignment.png" alt-text="Screenshot showing role assignments in Azure Active Directory app roles."::: + +1. Assign User Roles in Enterprise App Registration. + + 1. Search for your application again in Azure Active Directory but this time, select your application under the heading "enterprise applications." + + 1. Select "Users and groups" from the left navigation and add yourself to the admin role, and any other groups or users you want to assign at this time. + +1. Open Superset and verify login. + + 1. Log out of Superset and log in again. + + 1. Select "Settings" in the top right and choose "Profile." + + 1. Verify you have the Admin role. + + :::image type="content" source="./media/role-based-access-control/admin-role.png" alt-text="Screenshot showing Admin role in Superset is labeled on profile."::: + +## Next steps + +* [Expose Apache Superset to Internet](./configure-ingress.md)
hdinsight-aks	Trino Add Catalogs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-add-catalogs.md	+ + Title: Configure catalogs in Azure HDInsight on AKS +description: Add catalogs to an existing Trino cluster in HDInsight on AKS ++ Last updated : 08/29/2023++ +# Configure catalogs ++ +Every Trino cluster comes by default with few catalogs - system, tpcds, `tpch`. You can add your own catalogs same way you would do with OSS Trino. +In addition, HDInsight on AKS Trino allows storing secrets in Key Vault so you donΓÇÖt have to specify them explicitly in ARM template. + +You can add a new catalog by updating your cluster ARM template except the hive catalog, which you can add during [Trino cluster creation](./trino-create-cluster.md) in the Azure portal. + +This article demonstrates how you can add a new catalog to your cluster using ARM template. The example in this article describes the steps for adding SQL server and Memory catalogs. + +## Prerequisites + +* An operational HDInsight on AKS Trino cluster. +* Azure SQL database. +* Azure SQL server login/password are stored in the Key Vault secrets and user-assigned MSI attached to your Trino cluster granted permissions to read them. Refer [store credentials in Key Vault and assign role to MSI](../prerequisites-resources.md#create-azure-key-vault). +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). +* Review complete cluster ARM template example [arm-trino-catalog-sample.json](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-catalog-sample.json). + +## Steps to add catalog in ARM template + +1. Attach Key Vault and add secrets to `secretsProfile` under `clusterProfile` property. + + In this step, you need to make sure Key Vault and secrets are configured for Trino cluster. + In the following example, SQL server credentials are stored in these secrets: trinotest-admin-user, trinotest-admin-pwd. + + ```json + "secretsProfile": { + "keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}", + "secrets": [ + { + "referenceName": "trinotest-admin-user", + "keyVaultObjectName": "trinotest-admin-user", + "type": "secret" + }, + { + "referenceName": "trinotest-admin-pwd", + "keyVaultObjectName": "trinotest-admin-pwd", + "type": "secret" + } + ] + }, + ``` + +5. Add catalogs to `serviceConfigsProfiles` under `clusterProfile` property. + + In this step, you need to add Trino specific catalog configuration to the cluster. + The following example configures two catalogs using Memory and SQL server connectors. Catalog configuration may be specified in two different ways: + + * Key-value pairs in values section. + * Single string in content property. + + Memory catalog is defined using key-value pair and SQL server catalog is defined using single string option. + + ```json + "serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "memory.properties", + "values": { + "connector.name": "memory", + "memory.max-data-per-node": "128MB" + } + }, + { + "fileName": "trinotestdb1.properties", + "content":"connector.name=sqlserver\nconnection-url=jdbc:sqlserver://mysqlserver1.database.windows.net:1433;database=db1;encrypt=true;trustServerCertificate=false;hostNameInCertificate=.database.windows.net;loginTimeout=30;\nconnection-user=${SECRET_REF:trinotest-admin-user}\nconnection-password=${SECRET_REF:trinotest-admin-pwd}\n" + }, + ] + } + ] + } + ], + ``` + + Properties* + + \|Property\|Description\| + \|\|\| + \|serviceName\|trino\| + \|component\|Identifies that section configures catalogs, must be ΓÇ£catalogs."\| + \|files\|List of Trino catalog files to be added to the cluster.\| + \|filename\|List of Trino catalog files to be added to the cluster.\| + \|content\|`json` escaped string to put into trino catalog file. This string should contain all trino-specific catalog properties, which depend on type of connector used. For more information, see OSS trino documentation.\| + \|${SECRET_REF:\<referenceName\>}\|Special tag to reference a secret from secretsProfile. HDInsight on AKS Trino at runtime fetch the secret from Key Vault and substitute it in catalog configuration.\| + \|values\|ItΓÇÖs possible to specify catalog configuration using content property as single string, and using separate key-value pairs for each individual Trino catalog property as shown for memory catalog.\| + + Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). +
hdinsight-aks	Trino Add Delta Lake Catalog	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-add-delta-lake-catalog.md	+ + Title: Configure Delta Lake catalog +description: How to configure Delta Lake catalog in a Trino cluster. ++ Last updated : 08/29/2023++ +# Configure Delta Lake catalog ++ +This article provides an overview of how to configure Delta Lake catalog in your HDInsight on AKS Trino cluster. You can add a new catalog by updating your cluster ARM template except the hive catalog, which you can add during [Trino cluster creation](./trino-create-cluster.md) in the Azure portal. + +## Prerequisites + +* [Understanding of Trino cluster configuration](./trino-service-configuration.md). +* [Add catalogs to existing cluster](./trino-add-catalogs.md). + +## Steps to configure Delta Lake catalog + +1. Update your cluster ARM template to add a new Delta Lake catalog config file. This configuration needs to be defined in `serviceConfigsProfiles` under `clusterProfile` property of the ARM template. + + \|Property\|Value\|Description\| + \|-\|-\|-\| + \|fileName\|delta.properties\|Name of the catalog file. If the file is called delta.properties, `delta` becomes the catalog name.\| + \|connector.name\|delta-lake\|The type of the catalog. For Delta Lake, catalog type must be `delta-lake`\| + \|delta.register-table-procedure.enabled\|true\|Required to allow external tables to be registered.\| + + See [Trino documentation](https://trino.io/docs/current/connector/delta-lake.html#general-configuration) for other delta lake configuration options. + + ```json + "serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "delta.properties", + "values": { + "connector.name": "delta-lake", + "delta.register-table-procedure.enabled": "true" + } + } + ] + + ... + ``` + +1. Configure a Hive metastore for table definitions and locations if you don't have a metastore already configured. + + * Configure the Hive metastore for the Delta catalog. + + The `catalogOptions` section of the ARM template defines the Hive metastore connection details and it can set up + * Metastore config. + * Metastore instance. + * Link from the catalog to the metastore (`catalogName`). + + Add this `catalogOptions` configuration under `trinoProfile` property to your cluster ARM template: + + > [!NOTE] + > If Hive catalog options are already present, duplicate your Hive config and specify the delta catalog name. + + ```json + "trinoProfile": { + "catalogOptions": { + "hive": [ + { + "catalogName": "delta", + "metastoreDbConnectionURL": "jdbc:sqlserver://{{DATABASE_SERVER}}.database.windows.net:1433;database={DATABASE_NAME}};encrypt=true;trustServerCertificate=true;loginTimeout=30;", + "metastoreDbConnectionUserName": "{{DATABASE_USER_NAME}}", + "metastoreDbConnectionPasswordSecret": "hms-db-pwd-ref", + "metastoreWarehouseDir": "abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT_NAME}}.dfs.core.windows.net/" + } + ] + } + } ... + ``` + +1. Assign the `Storage Blob Data Owner` role to your cluster user-assigned MSI in the storage account containing the delta tables. Learn how to [assign a role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page). + + * User-assigned MSI name is listed in the `msiResourceId` property in the cluster's resource JSON. + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). +<br>Once successfully deployed, you can see the "delta" catalog in your Trino cluster. + +## Next steps + +[Read Delta Lakes tables (Synapse or External Location)](./trino-create-delta-lake-tables-synapse.md)
hdinsight-aks	Trino Add Iceberg Catalog	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-add-iceberg-catalog.md	+ + Title: Configure Iceberg catalog +description: How to configure iceberg catalog in a Trino cluster. ++ Last updated : 08/29/2023++ +# Configure Iceberg catalog ++ +This article provides an overview of how to configure Iceberg catalog in HDInsight on AKS Trino cluster. You can add a new catalog by updating your cluster ARM template except the hive catalog, which you can add during [Trino cluster creation](./trino-create-cluster.md) in the Azure portal. + +## Prerequisites + +* [Understanding of Trino cluster config](trino-service-configuration.md). +* [Add catalogs to existing cluster](trino-add-catalogs.md). + +## Steps to configure Iceberg catalog + +1. Update your cluster ARM template to add a new Iceberg catalog config file. This configuration needs to be defined in `serviceConfigsProfiles` under `clusterProfile` property of the ARM template. + + \|Property\|Value\|Description\| + \|-\|-\|-\| + \|fileName\|iceberg.properties\|Name of the catalog file. If the file is called iceberg.properties, then `iceberg` becomes the catalog name.\| + \|connector.name\|iceberg\|The type of the catalog. For Iceberg, catalog type must be `iceberg`\| + \|iceberg.register-table-procedure.enabled\|true\|Required to allow external tables to be registered.\| + + Refer to [Trino documentation](https://trino.io/docs/current/connector/iceberg.html#general-configuration) for other iceberg configuration options. + + ```json + "serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "ice.properties", + "values": { + "connector.name": "iceberg", + "iceberg.register-table-procedure.enabled": "true" + } + } + ] + + ... + ``` + +1. Configure a Hive metastore for table definitions and locations if you don't have a metastore already configured. + + * Configure the Hive metastore for the Iceberg catalog. + + The `catalogOptions` section of the ARM template defines the Hive metastore connection details and it sets up + * Metastore config. + * Metastore instance. + * Link from the catalog to the Metastore (`catalogName`). + + Add this `catalogOptions` configuration under `trinoProfile` property to your cluster ARM template: + + > [!NOTE] + > If Hive catalog options are already present, duplicate your Hive config and specify the iceberg catalog name. + + ```json + "trinoProfile": { + "catalogOptions": { + "hive": [ + { + "catalogName": "ice", + "metastoreDbConnectionURL": "jdbc:sqlserver://{{DATABASE_SERVER}}.database.windows.net:1433;database={{DATABASE_NAME}};encrypt=true;trustServerCertificate=true;loginTimeout=30;", + "metastoreDbConnectionUserName": "{{DATABASE_USER_NAME}}", + "metastoreDbConnectionPasswordSecret": "hms-db-pwd-ref", + "metastoreWarehouseDir": "abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT_NAME}}.dfs.core.windows.net/" + } + ] + } + } + ``` + +1. Assign the `Storage Blob Data Owner` role to your cluster user-assigned MSI in the storage account containing the iceberg tables. Learn how to [assign a role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page). + + * User-assigned MSI name is listed in the `msiResourceId` property in the cluster's resource JSON. + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). +<br>Once successfully deployed, you can see the "ice" catalog in your Trino cluster.
hdinsight-aks	Trino Airflow	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-airflow.md	+ + Title: Use Airflow with Trino cluster +description: How to create Airflow DAG connecting to Azure HDInsight on AKS Trino ++ Last updated : 08/29/2023++ +# Use Airflow with Trino cluster ++ +This article demonstrates how to configure available open-source [Airflow Trino provider](https://airflow.apache.org/docs/apache-airflow-providers-trino/stable/https://docsupdatetracker.net/index.html) to connect to HDInsight on AKS Trino cluster. +The objective is to show how you can connect Airflow to HDInsight on AKS Trino considering main steps as obtaining access token and running query. + +## Prerequisites + +* An operational HDInsight on AKS Trino cluster. +* Airflow cluster. +* Azure service principal client ID and secret to use for authentication. +* [Allow access to the service principal to Trino cluster](../hdinsight-on-aks-manage-authorization-profile.md). + +### Install Airflow providers and dependencies +Airflow DAG demonstrated in the following steps, requires the following Airflow providers and uses existing open-source [Trino provider](https://airflow.apache.org/docs/apache-airflow-providers-trino/stable/https://docsupdatetracker.net/index.html). To connect to Trino cluster, DAG uses MSAL for Python module. +```cmd + pip install msal + pip install apache-airflow-providers-trino + pip install 'apache-airflow[virtualenv]' +``` + +## Airflow DAG overview + +The following example demonstrates DAG which contains three tasks: + +* Connect to Trino cluster - This task obtains OAuth2 access token and creates a connection to Trino cluster. +* Query execution - This task executes sample query. +* Print result - This task prints result of the query to the log. +++ +## Airflow DAG code +Now let's create simple DAG performing those steps. Complete code as follows + +1. Copy the [following code](#example-code) and save it in $AIRFLOW_HOME/dags/example_trino.py, so Airlift can discover the DAG. +1. Update the script entering your Trino cluster endpoint and authentication details. +1. Trino endpoint (`trino_endpoint`) - HDInsight on AKS Trino cluster endpoint from Overview page in the Azure portal. +1. Azure Tenant ID (`azure_tenant_id`) - Identifier of your Azure Tenant, which can be found in Azure portal. +1. Service Principal Client ID - Client ID of an application or service principal to use in Airlift for authentication to your Trino cluster. +1. Service Principal Secret - Secret for the service principal. +1. Pay attention to connection properties, which configure JWT authentication type, https and port. These values are required to connect to HDInsight on AKS Trino cluster. + +> [!NOTE] +> Give access to the service principal ID (object ID) to your Trino cluster. Follow the steps to [grant access](../hdinsight-on-aks-manage-authorization-profile.md). + +### Example code + +Refer to [sample code](https://github.com/Azure-Samples/hdinsight-aks/blob/main/src/trino/trino-airflow-cluster.py). + +## Run Airflow DAG and check results +After restarting Airflow, find and run example_trino DAG. Results of the sample query should appear in Logs tab of the last task. ++ +> [!NOTE] +> For production scenarios, you should choose to handle connection and secrets diffirently, using Airflow secrets management. + +## Next steps +This example demonstrates basic steps required to connect Airflow to HDInsight on AKS Trino. Main steps are obtaining access token and running query. + +## See also +* [Getting started with Airflow](https://airflow.apache.org/docs/apache-airflow/stable/start.html) +* [Airflow Trino provider](https://airflow.apache.org/docs/apache-airflow-providers-trino/stable/https://docsupdatetracker.net/index.html) +* [Airflow secrets](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/https://docsupdatetracker.net/index.html) +* [HDInsight on AKS Trino authentication](./trino-authentication.md) +* [MSAL for Python](/entra/msal/python)
hdinsight-aks	Trino Authentication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-authentication.md	+ + Title: Client authentication +description: How to authenticate to Trino cluster ++ Last updated : 08/29/2023++ +# Authentication mechanism ++ +Azure HDInsight on AKS Trino provides tools such as CLI client, JDBC driver etc., to access the cluster, which is integrated with Azure Active Directory to simplify the authentication for users. +Supported tools or clients need to authenticate using Azure Active Directory OAuth2 standards that are, a JWT access token issued by Azure Active Directory must be provided to the cluster endpoint. + +This section describes common authentication flows supported by the tools. + +## Authentication flows overview +The following authentication flows are supported. + +> [!NOTE] +> Name is reserved and should be used to specify certain flow. ++ +\|Name\|Required parameters\|Optional parameters\|Description +\|-\|-\|-\|-\| +AzureDefault\|None\|Tenant ID, client ID\|Meant to be used during development in interactive environment. In most cases, user sign-in using browser. See [details](#azuredefault-flow).\| +AzureInteractive\|None\|Tenant ID, client ID\|User authenticates using browser. See [details](#azureinteractive-flow).\| +AzureDeviceCode\|None\|Tenant ID, client ID\|Meant for environments where browser isn't available. Device code provided to the user requires an action to sign-in on another device using the code and browser.\| +AzureClientSecret\|Tenant ID, client ID, client secret\|None\|Service principal identity is used, credentials required, non-interactive.\| +AzureClientCertificate\|Tenant ID, client ID, certificate file path\|Secret/password. If provided, used to decrypt PFX certificate. Otherwise expects PEM format.\|Service principal identity is used, certificate required, non-interactive. See [details](#azureclientcertificate-flow).\| +AzureManagedIdentity\|Tenant ID, client ID\|None\|Uses managed identity of the environment, for example, on Azure VMs or AKS pods.\| + +## AzureDefault flow + +This flow is default mode for the Trino CLI and JDBC if `auth` parameter isn't specified. In this mode, client tool attempts to obtain the token using several methods until token is acquired. +In the following chained execution, if token isn't found or authentication fails, process will continue with next method: + +[DefaultAzureCredential](/java/api/overview/azure/identity-readme#defaultazurecredential) -> [AzureInteractive](#azureinteractive-flow) -> AzureDeviceCode (if no browser) + +## AzureInteractive flow + +This mode is used when `auth=AzureInteractive` is provided or as part of `AzureDefault` chained execution. +> [!NOTE] +> +> If browser is available, it will show authentication prompt and awaits user action. If browser isn't available, it will fallback to `AzureDeviceCode` flow. + +## AzureClientCertificate flow + +Allows using PEM/PFX(PKCS #12) files for service principal authentication. If secret/password is provided, expects file in PFX(PKCS #12) format and uses the secret to decrypt the file. If secret isn't provided, expects PEM formatted file to include private and public keys. + +## Environment variables + +All the required parameters could be provided to CLI/JDBC directly in arguments or connection string. Some of the optional parameters, if not provided, is looked up in environment variables. + +> [!NOTE] +> +> Make sure to check environment variables if you face authentication issues. They may affect the flow. + +The following table describes the parameters that can be configured in environment variables for the different authentication flows. +<br>They will only be used if corresponding parameter isn't provided in the command line or connection string. + +\|Variable name\|Applicable authentication flows\|Description +\|-\|-\|-\| +\|AZURE_TENANT_ID\|All\|Azure Active Directory tenant ID.\| +\|AZURE_CLIENT_ID\|AzureClientSecret, AzureClientCertificate, AzureManagedIdentity\|Application/principal client ID.\| +\|AZURE_CLIENT_SECRET\|AzureClientSecret, AzureClientCertificate\|Secret or password for service principal or certificate file.\| +\|AZURE_CLIENT_CERTIFICATE_PATH\|AzureClientCertificate\|Path to certificate file.\|
hdinsight-aks	Trino Caching	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-caching.md	+ + Title: Configure caching +description: Learn how to configure caching in Trino ++ Last updated : 08/29/2023++ +# Configure caching ++ +Querying object storage using the Hive connector is a common use case for Trino. This process often involves sending large amounts of data. Objects are retrieved from HDFS or another supported object store by multiple workers and processed by those workers. Repeated queries with different parameters, or even different queries from different users, often access and transfer the same objects. + +HDInsight on AKS Trino has added final result caching capability, which provides the following benefits: + +* Reduce the load on object storage. +* Improve the query performance. +* Reduce the query cost. + +## Caching options + + Different options for caching: + +* [Final result caching](#final-result-caching): When enabled (in coordinator component configuration section), a result for any query for any catalog caches on a coordinator VM. +* [Hive/Iceberg/Delta Lake catalog caching](#hiveicebergdelta-lake-caching): When enabled (for a specific catalog of corresponding type), a split data for each query caches within cluster on worker VMs. + +## Final result caching + +Final result caching can be configured in two ways: +* [Using Azure portal](#using-azure-portal) +* [Using ARM template](#using-arm-template) + +Available configuration parameters are: + +\|Property\|Default\|Description\| +\|\|\|\| +\|`query.cache.enabled`\|false\|Enables final result caching if true.\| +\|`query.cache.ttl`\|-\|Defines a time till cache data is kept prior to eviction. For example: "10m","1h" \| +\|`query.cache.disk-usage-percentage`\|80\|Percentage of disk space used for cached data.\| +\|`query.cache.max-result-data-size`\|0\|Max data size for a result. If this value exceeded, then result doesn't cache.\| + +> [!NOTE] +> Final result caching is using query plan and ttl as a cache key. + +Final result caching can also be controlled through the following session parameters: + +\|Session parameter\|Default\|Description\| +\|\|\|\| +\|`query_cache_enabled`\|Original configuration value\|Enables/disables final result caching for a query/session.\| +\|`query_cache_ttl`\|Original configuration value\|Defines a time till cache data is kept prior to eviction.\| +\|`query_cache_max_result_data_size`\|Original configuration value\|Max data size for a result. If this value exceeded, then result doesn't cache.\| +\|`query_cache_forced_refresh`\|false\|When set to true, forces the result of query execution to be cached that is, the result replaces existing cached data if it exists).\| + +> [!NOTE] +> Session parameters can be set for a session (for example, if Trino CLI is used) or can be set in multi-statement before query text. +>For example, +>``` +>set session query_cache_enabled=true; +>select cust.name, * +>from tpch.tiny.orders +>join tpch.tiny.customer as cust on cust.custkey = orders.custkey +>order by cust.name +>limit 10; +>``` + +### Using Azure portal + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/trino-caching/portal-search.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster."::: + +1. Select your cluster name from the list page. + + :::image type="content" source="./media/trino-caching/portal-search-result.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list."::: + +1. Navigate to Configuration Management blade. + + :::image type="content" source="./media/trino-caching/azure-portal-configuration-management.png" alt-text="Screenshot showing Azure portal configuration management."::: + +1. Go to config.properties -> Custom configurations and then click Add. + + :::image type="content" source="./media/trino-caching/configuration-properties.png" alt-text="Screenshot showing custom configuration."::: + +1. Set the required properties, and click OK. + + :::image type="content" source="./media/trino-caching/set-properties.png" alt-text="Screenshot showing configuration properties."::: + +1. Save the configuration. + + :::image type="content" source="./media/trino-caching/save-configuration.png" alt-text="Screenshot showing how to save the configuration."::: + +### Using ARM template + +#### Prerequisites + +* An operational HDInsight on AKS Trino cluster. +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Review complete cluster [ARM template](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) sample. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +You need to define the properties in coordinator component in `properties.clusterProfile.serviceConfigsProfiles` section in the ARM template. +The following example demonstrates where to add the properties. + +```json +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "parameters": {}, + "resources": [ + { + "type": "microsoft.hdinsight/clusterpools/clusters", + "apiVersion": "<api-version>", + "name": "<cluster-pool-name>/<cluster-name>", + "location": "<region, e.g. westeurope>", + "tags": {}, + "properties": { + "clusterType": "Trino", + + "clusterProfile": { + + "serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "coordinator", + "files": [ + { + "fileName": "config.properties", + "values": { + "query.cache.enabled": "true", + "query.cache.ttl": "10m" + } + } + ] + } + ] + } + ] + } + + } + } + ] +} +``` ++ +## Hive/Iceberg/Delta Lake caching + +All three connectors share the same set of parameters as described in [Hive caching](https://trino.io/docs/current/connector/hive-caching.html). + +> [!NOTE] +> Certain parameters are not configurable and always set to their default values: <br>hive.cache.data-transfer-port=8898, <br>hive.cache.bookkeeper-port=8899, <br>hive.cache.location=/etc/trino/cache, <br>hive.cache.disk-usage-percentage=80 + +The following example demonstrates where to add the properties to enable Hive caching using ARM template. + +```json +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "parameters": {}, + "resources": [ + { + "type": "microsoft.hdinsight/clusterpools/clusters", + "apiVersion": "<api-version>", + "name": "<cluster-pool-name>/<cluster-name>", + "location": "<region, e.g. westeurope>", + "tags": {}, + "properties": { + "clusterType": "Trino", + + "clusterProfile": { + + "serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "hive1.properties", + "values": { + "connector.name": "hive" + "hive.cache.enabled": "true", + "hive.cache.ttl": "5d" + } + } + ] + } + ] + } + ] + } + + } + } + ] +} +``` + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal).
hdinsight-aks	Trino Catalog Glue	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-catalog-glue.md	+ + Title: Query data from AWS S3 and with Glue +description: How to configure HDInsight on AKS Trino catalogs with Glue as metastore ++ Last updated : 08/29/2023+++ +# Query data from AWS S3 and with Glue ++ +This article provides examples of how you can add catalogs to an HDInsight on AKS Trino cluster where catalogs are using AWS Glue as metastore and S3 as storage. + +## Prerequisites + +* [Understanding of HDInsight on AKS Trino cluster configurations](./trino-service-configuration.md). +* [How to add catalogs to an existing cluster](./trino-add-catalogs.md). +* [AWS account with Glue and S3](./trino-catalog-glue.md#quickstart-with-aws-glue-and-s3). + +## Trino catalogs with S3 and Glue as metastore +Several Trino connectors support AWS Glue. More details on catalogs Glue configuration properties can be found in [Trino documentation](https://trino.io/docs/410/connector/hive.html#aws-glue-catalog-configuration-properties). + +Refer to [Quickstart with AWS Glue and S3](./trino-catalog-glue.md#quickstart-with-aws-glue-and-s3) for setting up AWS resources. ++ +> [!NOTE] +> +> Securely store Glue and S3 access keys in Azure Key Vault, and [configure secretsProfile](./trino-add-catalogs.md) to use secrets in catalogs instead of specifying them in open text in ARM template. ++ +### Add Hive catalog + +You can add the following sample JSON in your HDInsight on AKS Trino cluster under `clusterProfile` section in the ARM template. +<br>Update the values as per your requirement. + +```json +"serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "hiveglue.properties", + "values": { + "connector.name": "hive", + "hive.metastore": "glue", + "hive.metastore.glue.region": "us-west-2", + "hive.metastore.glue.endpoint-url": "glue.us-west-2.amazonaws.com", + "hive.metastore.glue.aws-access-key": "${SECRET_REF:aws-user-access-key-ref}", + "hive.metastore.glue.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}", + "hive.metastore.glue.catalogid": "<AWS account ID>", + "hive.s3.aws-access-key": "{SECRET_REF:aws-user-access-key-ref}", + "hive.s3.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}" + "hive.temporary-staging-directory-enabled": "false" + } + } + ] + } + ] + } +] +``` + +### Add Delta Lake catalog + +You can add the following sample JSON in your HDInsight on AKS Trino cluster under `clusterProfile` section in the ARM template. +<br>Update the values as per your requirement. + +```json +"serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "deltaglue.properties", + "values": { + "connector.name": "delta_lake", + "hive.metastore": "glue", + "hive.metastore.glue.region": "us-west-2", + "hive.metastore.glue.endpoint-url": "glue.us-west-2.amazonaws.com", + "hive.metastore.glue.aws-access-key": "${SECRET_REF:aws-user-access-key-ref}", + "hive.metastore.glue.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}", + "hive.metastore.glue.catalogid": "<AWS account ID>", + "hive.s3.aws-access-key": "{SECRET_REF:aws-user-access-key-ref}", + "hive.s3.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}" + } + } + ] + } + ] + } +] +``` + +### Add Iceberg catalog +You can add the following sample JSON in your HDInsight on AKS Trino cluster under `clusterProfile` section in the ARM template. +<br>Update the values as per your requirement. + +```json +"serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "iceglue.properties", + "values": { + "connector.name": "iceberg", + "iceberg.catalog.type": "glue", + "hive.metastore.glue.region": "us-west-2", + "hive.metastore.glue.endpoint-url": "glue.us-west-2.amazonaws.com", + "hive.metastore.glue.aws-access-key": "${SECRET_REF:aws-user-access-key-ref}", + "hive.metastore.glue.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}", + "hive.metastore.glue.catalogid": "<AWS account ID>", + "hive.s3.aws-access-key": "{SECRET_REF:aws-user-access-key-ref}", + "hive.s3.aws-secret-key": "{SECRET_REF:aws-user-access-secret-ref}" + } + } + ] + } + ] + } +] +``` + +### AWS access keys from Azure Key Vault +Catalog examples in the previous code refer to access keys stored as secrets in Azure Key Vault, here's how you can configure that. +```json +"secretsProfile": { + "keyVaultResourceId": "/subscriptions/1234abcd-aaaa-0000-zzzz-000000000000/resourceGroups/trino-rp/providers/Microsoft.KeyVault/vaults/trinoakv", + "secrets": [ + { + "referenceName": "aws-user-access-key-ref", + "keyVaultObjectName": "aws-user-access-key", + "type": "secret" + }, + { + "referenceName": "aws-user-access-secret-ref", + "keyVaultObjectName": "aws-user-access-secret", + "type": "secret" + } + ] +}, +``` + +<br>Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). + +## Quickstart with AWS Glue and S3 +### 1. Create AWS user and save access keys to Azure Key Vault. +Use existing or create new user in AWS IAM - this user is used by Trino connector to read data from Glue/S3. Create and retrieve access keys on Security Credentials tab and save them as secrets into [Azure Key Vault](/azure/key-vault/secrets/about-secrets) linked to your HDInsight on AKS Trino cluster. Refer to [Add catalogs to existing cluster](./trino-add-catalogs.md) for details on how to link Key Vault to your Trino cluster. + +### 2. Create AWS S3 bucket +Use existing or create new S3 bucket, it's used in Glue database as location to store data. + +### 3. Setup AWS Glue Database +In AWS Glue, create new database, for example, "trinodb" and configure location, which points to your S3 bucket from previous step, for example, `s3://trinoglues3/` + +### 4. Configure Trino catalog +Configure a Trino catalog using examples above [Trino catalogs with S3 and Glue as metastore](./trino-catalog-glue.md#trino-catalogs-with-s3-and-glue-as-metastore). + +### 5. Create and query sample table +Here are few sample queries to test connectivity to AWS reading and writing data. Schema name is AWS Glue database name you created earlier. +``` +create table iceglue.trinodb.tpch_orders_ice as select * from tpch.sf1.orders; +select * from iceglue.trinodb.tpch_orders_ice; +```
hdinsight-aks	Trino Configuration Troubleshoot	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-configuration-troubleshoot.md	+ + Title: Troubleshoot cluster configuration +description: How to understand and fix errors for Trino clusters for HDInsight on AKS. ++ Last updated : 08/29/2023+++ +# Troubleshoot cluster configuration ++ +Incorrect cluster configuration may lead to deployment errors. Typically those errors occur when incorrect configuration provided in ARM template or input in Azure portal, for example, on Configuration management page. + +> [!NOTE] +> Those errors will have reason starting with "User." prefix. + +Example configuration error: ++ +The following table provides error codes and their description to help diagnose and fix common errors. + +## Configuration errors +\|Error code\|Description +\|-\|-\| +\|User.TrinoError\|Trino process startup fails with generic error. Verify trino specific configuration files in [Service configuration profile](./trino-service-configuration.md). +\|User.Configuration.TrinoError\|Trino process startup fails with application configuration error. Verify trino specific configuration files in [Service configuration profile](./trino-service-configuration.md). +\|User.Configuration.FileForbidden\|Changes to some trino configuration files aren't allowed. +\|User.Configuration.ParameterForbidden\|Changes to some trino configuration parameters aren't allowed. +\|User.Configuration.InvalidComponent\|Invalid service component specified. For more information, see [Service configuration profile](./trino-service-configuration.md). +\|User.Configuration.MiscFile.InvalidName\|Invalid miscellaneous fileName. For more information, see [Miscellaneous files](./trino-miscellaneous-files.md). +\|User.Configuration.MiscFile.InvalidPath\|Miscellaneous file path is too long. +\|User.Configuration.MiscFile.NotFound\|Found MISC tag without corresponding file in miscellaneous files configuration. For more information, see [Miscellaneous files](./trino-miscellaneous-files.md). +\|User.Configuration.MiscFile.DuplicateName\|Duplicate fileName value found in [Miscellaneous files](./trino-miscellaneous-files.md) configuration section. +\|User.Configuration.MiscFile.DuplicatePath\|Duplicate path value found in [Miscellaneous files](./trino-miscellaneous-files.md) configuration section. +\|User.Configuration.FormatError\|Malformed service configuration, content/values properties used in wrong sections. For more information, see [Service configuration profile](./trino-service-configuration.md). +\|User.Configuration.HiveMetastore.MultiplePasswords\|Multiple passwords specified in catalogOptions.hive catalog. Specify one of two properties either metastoreDbConnectionPasswordSecret or metastoreDbConnectionPassword. +\|User.Configuration.HiveMetastore.PasswordRequired\|Password required for catalogOptions.hive catalog. Specify one of two properties either metastoreDbConnectionPasswordSecret or metastoreDbConnectionPassword. +\|User.Secrets.Error\|Misconfigured secrets or permissions in Azure KeyVault. +\|User.Secrets.InvalidKeyVaultUri\|Missing Azure Key Vault URI in secretsProfile. +\|User.Secrets.KeyVaultUriRequired\|Missing Azure Key Vault URI in secretsProfile. +\|User.Secrets.DuplicateReferenceName\|Duplicate referenceName found in secretsProfile. +\|User.Secrets.KeyVaultObjectUsedMultipleTimes\|Same key vault object/version used multiple times as different references in secrets list in secretsProfile. +\|User.Secrets.NotSpecified\|Found SECRET_REF tag without corresponding secret object in secretsProfile. +\|User.Secrets.NotFound\|Specified Key Vault object in secretsProfile not found in KeyVault. Verify name, type and version of the object. +\|User.Plugins.Error\|Misconfigured plugins or permissions in storage account. +\|User.Plugins.DuplicateName\|Duplicate plugin name used in cluster configuration userPluginsSpec. +\|User.Plugins.DuplicatePath\|Duplicate plugin path used in cluster configuration userPluginsSpec. +\|User.Plugins.InvalidPath\|Malformed storage URI for a plugin. +\|User.Plugins.PathNotFound\|Specified path for a plugin not found in storage account. +\|User.Telemetry.InvalidStoragePath\|Malformed storage URI in userTelemetrySpec. +\|User.Telemetry.HiveCatalogNotFound\|Nonexistent Hive catalog specified as target for telemetry tables in userTelemetrySpec. +\|User.CatalogOptions.HiveCatalogNotFound\|Hive catalog not found in trino catalogs [Service configuration profile](./trino-service-configuration.md) for a given catalogOptions.hive configuration. For more information, see [Hive metastore](./trino-connect-to-metastore.md). + +## System errors +Some of the errors may occur due to environment conditions and be transient. These errors have reason starting with "System." as prefix. In such cases, try the following steps: + +1. Collect the following information: + 1. Azure request CorrelationId. It can be found either in Notifications area; or under Resource Group where cluster is located, on Deployments page; or in az command output. + 1. DeploymentId. It can be found in the Cluster Overview page. + 1. Detailed error message. +1. Contact support team with this information. + +\|Error code\|Description +\|-\|-\| +\|System.DependencyFailure\|Failure in one of cluster components.
hdinsight-aks	Trino Connect To Metastore	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-connect-to-metastore.md	+ + Title: Add external Hive metastore database +description: Connecting to the HIVE metastore for Trino clusters in HDInsight on AKS ++ Last updated : 08/29/2023++ +# Use external Hive metastore database ++ +Hive metastore is used as a central repository for storing metadata about the data. This article describes how you can add a Hive metastore database to your HDInsight on AKS Trino cluster. There are two ways: + +* You can add a Hive catalog and link it to an external Hive metastore database during [Trino cluster creation](./trino-create-cluster.md). + +* You can add a Hive catalog and attach an external Hive metastore database to your cluster using ARM template update. + +The following example covers the addition of Hive catalog and metastore database to your cluster using ARM template. + +## Prerequisites +* An operational HDInsight on AKS Trino cluster. +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Review complete cluster [ARM template](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) sample. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). ++ +> [!NOTE] +> +> * Currently, we support Azure SQL Database as in-built metastore. +> * Due to Hive limitation, "-" (hyphen) character in the metastore database name is not supported. +> * Only single metastore database connection is supported, all catalogs listed in `clusterProfile.trinoProfile.catalogOptions.hive` section will be configured to use one and the same database parameters which are specified first. + +## Add external Hive metastore database + +There are few important sections you need to add to your cluster ARM template to configure the Hive catalog and Hive metastore database: + +- `secretsProfile` ΓÇô It specifies Azure Key Vault and list of secrets to be used in Trino cluster, required to connect to external Hive metastore. +- `serviceConfigsProfiles` - It includes configuration for Trino catalogs. For more information, see [Add catalogs to existing cluster](trino-add-catalogs.md). +- `trinoProfile.catalogOptions.hive` ΓÇô List of Hive or iceberg or delta catalogs with parameters of external Hive metastore database for each catalog. To use external metastore database, catalog must be present in this list. ++ +\| Property\| Description\| Example\| +\|\|\|\| +\|secretsProfile.keyVaultResourceId\|Azure resource ID string to Azure Key Vault where secrets for Hive metastore are stored.\|/subscriptions/0000000-0000-0000-0000-000000000000/resourceGroups/trino-rg/providers/Microsoft.KeyVault/vaults/trinoakv\| +\|secretsProfile.secrets[].referenceName\|Unique reference name of the secret to use later in clusterProfile.\|Secret1_ref\| +\|secretsProfile.secrets[].type\|Type of object in Azure Key Vault, only ΓÇ£secretΓÇ¥ is supported.\|secret\| +\|secretsProfile.secrets[].keyVaultObjectName\|Name of secret object in Azure Key Vault containing actual secret value.\|secret1\| +\|trinoProfile.catalogOptions.hive\|List of Hive or iceberg or delta catalogs with parameters of external Hive metastore database, require parameters for each. To use external metastore database, catalog must be present in this list. +\|trinoProfile.catalogOptions.hive[].catalogName\|Name of Trino catalog configured in `serviceConfigsProfiles`, which configured to use external Hive metastore database.\|hive1\| +\|trinoProfile.catalogOptions.hive[].metastoreDbConnectionURL\|JDBC connection string to database.\|jdbc:sqlserver://mysqlserver1.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30\| +\|trinoProfile.catalogOptions.hive[].metastoreDbConnectionUserName\|SQL user name to connect to database.\|trinoadmin\| +\|trinoProfile.catalogOptions.hive[].metastoreDbConnectionPasswordSecret\|Secret referenceName configured in secretsProfile with password.\|hms-db-pwd\| +\|trinoProfile.catalogOptions.hive[].metastoreWarehouseDir\|ABFS URI to location in storage where data is stored.\|`abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse`\| + +To configure external Hive metastore to an existing Trino cluster, add the required sections in your cluster ARM template by referring to the following example: ++ +```json +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "parameters": {}, + "resources": [ + { + "type": "microsoft.hdinsight/clusterpools/clusters", + "apiVersion": "<api-version>", + "name": "<cluster-pool-name>/<cluster-name>", + "location": "<region, e.g. westeurope>", + "tags": {}, + "properties": { + "clusterType": "Trino", + + "clusterProfile": { + "secretsProfile": { + "keyVaultResourceId": "/subscriptions/{USER_SUBSCRIPTION_ID}/resourceGroups/{USER_RESOURCE_GROUP}/providers/Microsoft.KeyVault/vaults/{USER_KEYVAULT_NAME}", + "secrets": [ + { + "referenceName": "hms-db-pwd", + "type": "secret", + "keyVaultObjectName": "hms-db-pwd" + } ] + }, + "serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [ + { + "fileName": "hive1.properties", + "values": { + "connector.name": "hive" + } + } + ] + } + ] + } + ], + "trinoProfile": { + "catalogOptions": { + "hive": [ + { + "catalogName": "hive1", + "metastoreDbConnectionURL": "jdbc:sqlserver://mysqlserver1.database.windows.net;database=myhmsdb1;encrypt=true;trustServerCertificate=true;create=false;loginTimeout=30", + "metastoreDbConnectionUserName": "trinoadmin", + "metastoreDbConnectionPasswordSecret": "hms-db-pwd", + "metastoreWarehouseDir": "abfs://container1@myadlsgen2account1.dfs.core.windows.net/hive/warehouse" + } + ] + } + } + } + } + } + ] +} +``` + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). +Once successfully deployed, you can see the "hive1" catalog in your Trino cluster. + +You can run a few simple queries to try the Hive catalog. + +Check if Hive catalog is created successfully. + +``` +show catalogs; +``` +Query a table (In this example, "hive1" is the name of hive catalog specified). +``` +create schema hive1.schema1; +create table hive1.schema1.tpchorders as select * from tpch.tiny.orders; +select * from hive1.schema1.tpchorders limit 100; +```
hdinsight-aks	Trino Connectors	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-connectors.md	+ + Title: Trino connectors +description: Connectors available for Trino. ++ Last updated : 08/29/2023++ +# Trino connectors ++ +Trino in HDInsight on AKS enables seamless integration with data sources. You can refer to the following documentation for open-source connectors. + +* [BigQuery](https://trino.io/docs/410/connector/bigquery.html) +* [Black Hole](https://trino.io/docs/410/connector/blackhole.html) +* [ClickHouse](https://trino.io/docs/410/connector/clickhouse.html) +* [Delta Lake](https://trino.io/docs/410/connector/delta-lake.html) +* [Druid](https://trino.io/docs/410/connector/druid.html) +* [Elasticsearch](https://trino.io/docs/410/connector/elasticsearch.html) +* [Google Sheets](https://trino.io/docs/410/connector/googlesheets.html) +* [Hive](https://trino.io/docs/410/connector/hive.html) +* [`Hudi`](https://trino.io/docs/410/connector/hudi.html) +* [Iceberg](https://trino.io/docs/410/connector/iceberg.html) +* [Ignite](https://trino.io/docs/410/connector/ignite.html) +* [JMX](https://trino.io/docs/410/connector/jmx.html) +* [Kafka](https://trino.io/docs/410/connector/kafka.html) +* [Kudu](https://trino.io/docs/410/connector/kudu.html) +* [Local File](https://trino.io/docs/410/connector/localfile.html) +* [MariaDB](https://trino.io/docs/410/connector/mariadb.html) +* [Memory](https://trino.io/docs/410/connector/memory.html) +* [MongoDB](https://trino.io/docs/410/connector/mongodb.html) +* [MySQL](https://trino.io/docs/410/connector/mysql.html) +* [Oracle](https://trino.io/docs/410/connector/oracle.html) +* [Phoenix](https://trino.io/docs/410/connector/phoenix.html) +* [PostgreSQL](https://trino.io/docs/410/connector/postgresql.html) +* [Prometheus](https://trino.io/docs/410/connector/prometheus.html) +* [Redis](https://trino.io/docs/410/connector/redis.html) +* [SingleStore](https://trino.io/docs/410/connector/memsql.html) +* [SQL Server](https://trino.io/docs/410/connector/sqlserver.html) +* [System](https://trino.io/docs/410/connector/system.html) +* [Thrift](https://trino.io/docs/410/connector/thrift.html) +* [TPCDS](https://trino.io/docs/410/connector/tpcds.html) +* [TPCH](https://trino.io/docs/410/connector/tpch.html)
hdinsight-aks	Trino Create Cluster	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-create-cluster.md	+ + Title: Create a Trino cluster - Azure portal +description: Creating a Trino cluster in HDInsight on AKS on the Azure portal. ++ Last updated : 08/29/2023++ +# Create a Trino cluster in the Azure portal (Preview) ++ +This article describes the steps to create an HDInsight on AKS Trino cluster by using the Azure portal. + +## Prerequisites + +* [Resource prerequisites](../prerequisites-resources.md) +* [Create a cluster pool](../quickstart-create-cluster.md#create-a-cluster-pool) + +> [!IMPORTANT] +> For creating a cluster in new cluster pool, assign AKS agentpool MSI "Managed Identity Operator" role on the user-assigned managed identity created as part of resource prerequisite. +> +> AKS agentpool managed identity gets created during cluster pool creation. You can identify the AKS agentpool managed identity by (your clusterpool name)-agentpool. +> Follow these steps to [assign the role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page). + +## Create a Trino cluster + +Once the cluster pool deployment completes, continue to use the Azure portal to create a Trino cluster. + +1. In the Azure portal, type HDInsight cluster pools/HDInsight and select Azure HDInsight on AKS cluster pools to go to the cluster pools page. On the HDInsight on AKS cluster pools page, select the cluster pool in which you want to add a new Trino cluster. + + :::image type="content" source="./media/trino-create-cluster/search-bar.png" alt-text="Screenshot showing search bar in Azure portal."::: + +1. On the specific cluster pool page, click + New cluster at the top left and then provide the following information: + + :::image type="content" source="./media/trino-create-cluster/trino-create-cluster-basic-details.png" alt-text="Screenshot showing basic tab of create Trino cluster."::: + :::image type="content" source="./media/trino-create-cluster/trino-create-cluster-more-basic-details.png" alt-text="Screenshot showing more details of basic tab of create Trino cluster."::: + + \| Property\| Description\| + \|\|\| + \|Subscription \| This field is autopopulated with the Azure subscription that was registered for the cluster pool.\| + \|Resource group\|This field is autopopulated and shows the resource group on the cluster pool.\| + \|Region\|This field is autopopulated and shows the region selected on the cluster pool.\| + \|Cluster pool\|This field is autopopulated and shows the cluster pool name in which the cluster is now getting created. To create a cluster in a different pool, find that cluster pool in the portal and click + New cluster.\| + \|Cluster pool version\|This field is autopopulated and shows the cluster pool version on which the cluster is now getting created.\| + \|HDInsight on AKS Version \| Select the minor or patch version of the HDInsight on AKS of the new cluster. For more information, see [versions](../versions.md).\| + \|Cluster type \| From the drop-down list, select the cluster type as Trino.\| + \|Cluster name \|Enter the name of the new cluster.\| + \|User-assigned managed identity \| From the drop-down list, select the managed identity to use with the cluster. If you're the owner of the Managed Service Identity (MSI), and the MSI doesn't have Managed Identity Operator role on the cluster, click the link below the box to assign the permission needed from the AKS agent pool MSI. If the MSI already has the correct permissions, no link is shown. See the [prerequisites](#prerequisites) for other role assignments required for the MSI.\| + \|Virtual network (VNet) \| The virtual network for the cluster. it's derived from the cluster pool.\| + \|Subnet\|The virtual subnet for the cluster. it's derived from the cluster pool.\| + + For Hive catalog, provide the following information: + + :::image type="content" source="./media/trino-create-cluster/trino-hive-catalog.png" alt-text="Screenshot showing basic tab of create Trino Hive catalog."::: + + \| Property\| Description\| + \|\|\| + \|Use Hive catalog\|Enable this option to use an external Hive metastore. \| + \|Hive catalog name\|Enter the name for the Hive catalog to be added to Trino catalogs.\| + \|SQL Database for Hive\|From the drop-down list, select the SQL Database in which to add hive-metastore tables. See [resource prerequisite](../prerequisites-resources.md#create-azure-sql-database) for other requirements for SQL Database. \| + \|SQL admin username\|Enter the SQL server admin username. This account is used by metastore to communicate to SQL database.\| + \|Key vault\|From the drop-down list, select the Key Vault, which contains a secret with password for SQL server admin username. See [resource prerequisite](../prerequisites-resources.md#create-azure-key-vault) for other requirements for Key Vault. \| + \|SQL password secret name\|Enter the secret name from the Key Vault where the SQL database password is stored.\| + \|Default storage for Hive catalog\|From the drop-down list, select the default storage account to use for the Hive catalog.\| + \|Container\|Select the existing container in the default storage account for storing data and logs or create a new container.\| + + > [!NOTE] + > + > * Currently, we support Azure SQL Database as in-built metastore. + > * Due to Hive limitation, "-" (hyphen) character in the metastore database name is not supported. + + For more information, see [hive metastore](./trino-connect-to-metastore.md). + + For Storing query events, provide the following information: + + :::image type="content" source="./media/trino-create-cluster/trino-store-query-event.png" alt-text="Screenshot showing basic tab of create Trino store query events."::: + + \| Property\| Description\| + \|\|\| + \|Container for events\|Select the name of the container in the default storage account to write the query events.\| + \|Path in Container\|Provide fully qualified path in the container to write the query events.\| + \|Hive catalog schema\|Enter the name of the schema to create tables and views to read query events in this hive catalog.\| + \|Partition retention days\|Enter the retention period (in days) for query events partitions.\| + + For more information, see [query events logging](./trino-query-logging.md). + + Click Next: Configuration to continue. + +1. On the Configuration page, provide the following information: + + :::image type="content" source="./media/trino-create-cluster/trino-configuration-tab.png" alt-text="Screenshot showing basic details of create Trino configuration tab."::: + + \|Property\|Description\| + \|\|\| + \|Head node size\| This value is same as the worker node size.\| + \|Number of head nodes\|This value is set by default based on the cluster type.\| + \|Worker node size\| From the drop-down list, select the recommended SKU or you can choose the SKU available in your subscription by clicking Select VM size.\| + \|Number of worker nodes\|Select the number of worker nodes required for your cluster.\| + + For Autoscale and SSH, provide the following information: + + :::image type="content" source="./media/trino-create-cluster/trino-autoscale-details.png" alt-text="Screenshot showing autoscale options."::: + + :::image type="content" source="./media/trino-create-cluster/trino-enable-ssh.png" alt-text="Screenshot showing enable ssh option."::: + + \|Property\|Description\| + \|\|\| + \|Auto Scale\|Upon selection, you would be able to choose the schedule based autoscale to configure the schedule for scaling operations.\| + \|Enable SSH\|Upon selection, you can opt for total number of SSH nodes required, which are the access points for the Trino CLI using Secure Shell. The maximum SSH nodes allowed is 5.\| + + + Click Next: Integrations to continue. + +1. On the Integrations page, provide the following information: + + :::image type="content" source="./media/trino-create-cluster/trino-integrations-tab.png" alt-text="Screenshot showing integrations options."::: + + \|Property\|Description\| + \|\|\| + \|Log analytics\| This feature is available only if the cluster pool has associated log analytics workspace, once enabled the logs to collect can be selected.\| + \|Azure Prometheus \| This feature is to view Insights and Logs directly in your cluster by sending metrics and logs to Azure Monitor workspace.\| + + Click Next: Tags to continue. + +1. On the Tags page, enter tags (optional) youΓÇÖd like to assign to the cluster. + + :::image type="content" source="./media/trino-create-cluster/trino-tags-tab.png" alt-text="Screenshot showing tag options."::: + + \| Property \| Description\| + \|\|\| + \|Name \| Enter a name (key) that helps you identify resources based on settings that are relevant to your organization. For example, "Environment" to track the deployment environment for your resources.\| + \| Value \| Enter the value that helps to relate to the resources. For example, "Production" to identify the resources deployed to production.\| + \| Resource \| Select the applicable resource type.\| + + Select Next: Review + create to continue. + +1. On the Review + create page, look for the Validation succeeded message at the top of the page and then click Create. + + :::image type="content" source="./media/trino-create-cluster/create-cluster-review-create-page.png" alt-text="Screenshot showing cluster review and create tab." ::: + + The Deployment is in process page is displayed which the cluster is created. It takes 5-10 minutes to create the cluster. Once the cluster is created the "Your deployment is complete" message is displayed. + + :::image type="content" source="./media/trino-create-cluster/custom-deployment-complete.png" alt-text="Screenshot showing custom deployment complete."::: + + If you navigate away from the page, you can check the status of the deployment by clicking Notifications icon. + + > [!TIP] + > For troubleshooting any deployment errors, you can refer to this [page](../create-cluster-error-dictionary.md).
hdinsight-aks	Trino Create Delta Lake Tables Synapse	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-create-delta-lake-tables-synapse.md	+ + Title: Read Delta Lake tables (Synapse or External Location) +description: How to read external tables created in Synapse or other systems into a Trino cluster. ++ Last updated : 08/29/2023++ +# Read Delta Lake tables (Synapse or external location) ++ +This article provides an overview of how to read a Delta Lake table without having any access to the metastore (Synapse or other metastores without public access). + +You can perform the following operations on the tables using HDInsight on AKS Trino. + +* DELETE +* UPDATE +* INSERT +* MERGE + +## Prerequisites + +* [Configure Delta Lake catalog](trino-add-delta-lake-catalog.md). + +## Create Delta Lake schemas and tables + +This section shows how to create a Delta table over a pre-existing location given you already have a Delta Lake catalog configured. + +1. Browse the storage account using the `Storage browser` in the Azure portal to where the base directory of your table is. If this table originates in Synapse, it's likely under a `synapse/workspaces/.../warehouse/` path and will be named after your table and contains a `_delta_log` directory. Select `Copy URL` from the three dots located next to the folder. + + You need to convert this http path into an ABFS (Azure Blob File System) path: + + The storage http path is structured like this: + `https://{{AZURE_STORAGE_ACCOUNT}}.blob.core.windows.net/{{AZURE_STORAGE_CONTAINER}}/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/` + + ABFS paths need to look like this: + `abfss://{{AZURE_STORAGE_CONTAINER}}@{{AZURE_STORAGE_ACCOUNT}}.dfs.core.windows.net/synapse/workspaces/my_workspace/warehouse/{{TABLE_NAME}}/` + + Example: + `abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/` ++ +1. Create a Delta Lake schema in HDInsight on AKS Trino. + + ```sql + CREATE SCHEMA delta.default; + ``` + + Alternatively, you can also create a schema in a specific storage account: + + ```sql + CREATE SCHEMA delta.default WITH (location = 'abfss://container@storageaccount.dfs.core.windows.net/trino/'); + ``` + +1. Use the [`register_table` procedure to create the table](https://trino.io/docs/current/connector/delta-lake.html#register-table). + + ```sql + CALL delta.system.register_table(schema_name => 'default', table_name => 'table_name', table_location => 'abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name/'); + ``` + +1. Query the table to verify. + + ```sql + SELECT * FROM delta.default.table_name + ``` + +## Write Delta Lake tables in Synapse Spark + +Use `format("delta")` to save a dataframe as a Delta table, then you can use the path where you saved the dataframe as delta format to register the table in HDInsight on AKS Trino. + +```python +my_dataframe.write.format("delta").save("abfss://container@storageaccount.dfs.core.windows.net/synapse/workspaces/workspace_name/warehouse/table_name") +``` +## Next steps +[How to configure caching in Trino](./trino-caching.md)
hdinsight-aks	Trino Custom Plugins	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-custom-plugins.md	+ + Title: Add custom plugins in Azure HDInsight on AKS +description: Add custom plugins to an existing Trino cluster in HDInsight on AKS ++ Last updated : 08/29/2023++ +# Custom plugins ++ +This article provides details on how to deploy custom plugins to your HDInsight on AKS Trino cluster. + +Trino provides a rich interface allowing users to write their own plugins such as event listeners, custom SQL functions etc. You can add the configuration described in this article to make custom plugins available in your Trino cluster using ARM template. + +## Prerequisites +* An operational HDInsight on AKS Trino cluster. +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Review complete cluster [ARM template](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) sample. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +## Add custom plugins + +A `userPluginsSpec.plugins` configuration authored in resource `[].properties.clusterProfile.trinoProfile` section in the ARM template allows you to specify the plugins that need to be downloaded during a cluster deployment. +`userPluginsSpec.plugins` defines a list that describes what plugins need to be installed and from which location, as described by the following fields. + +\|Property\| Description\| +\|\|\| +\|name\|This field maps to the subdirectory in trino plugins directory that contains all the plugins under path field as described here.\| +\|path\|Fully qualified path to a directory containing all the jar files required for the plugin. The supported storage for storing these jars is Azure Data Lake Storage Gen2.\| +\|enabled\|A boolean property that enables/disables this plugin from being downloaded onto the cluster.\| ++ +> [!NOTE] +> Custom plugin deployment uses user-assigned Managed Identity (MSI) tied to the cluster to authenticate against the storage account. Ensure that the storage account holding the plugins has appropriate access granted for the Managed Identity tied to the cluster. + +The following example demonstrates how a sample plugin is made available to a Trino cluster. Add this sample json under `[].properties.clusterProfile` in the ARM template. + +```json +"trinoProfile": { + "userPluginsSpec": { + "plugins": [ + { + "name": "exampleplugin", + "path": "https://examplestorageaccount.blob.core.windows.net/plugins/myplugins/", + "enabled": true + } + ] + } +} +``` + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). + +> [!NOTE] +> To update the plugins on an existing cluster, it requires a deployment so that the new changes are picked up.
hdinsight-aks	Trino Fault Tolerance	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-fault-tolerance.md	+ + Title: Configure fault-tolerance +description: Learn how to configure fault-tolerance in HDInsight on AKS Trino. ++ Last updated : 08/29/2023++ +# Fault-tolerant execution ++ +HDInsight on AKS Trino supports [fault-tolerant execution](https://trino.io/docs/current/admin/fault-tolerant-execution.html) to mitigate query failures and increase resilience. +This article describes how you can enable fault tolerance for your HDInsight on AKS Trino cluster. + +## Configuration + +Fault-tolerant execution is disabled by default. It can be enabled by adding `retry-policy` property in config.properties settings. Learn how to [manage configurations](./trino-service-configuration.md) of your cluster. + +\|Property\|Allowed Values\|Description\| +\|-\|-\|-\| +\|`retry-policy`\|QUERY or TASK\| Setting determines whether Trino retries failing tasks or entire queries if there's a failure.\| + +For more details, refer [Trino documentation](https://trino.io/docs/current/admin/fault-tolerant-execution.html). + +To enable fault-tolerant execution on queries/tasks with a larger result set, configure an exchange manager that utilizes external storage for spooled data beyond the in-memory buffer size. + +## Exchange manager + +Exchange manager is responsible for managing spooled data to back fault-tolerant execution. For more details, refer [Trino documentation]( https://trino.io/docs/current/admin/fault-tolerant-execution.html#fte-exchange-manager). +<br>HDInsight on AKS Trino supports `filesystem` based exchange managers that can store the data in Azure Blob Storage (ADLS Gen 2). This section describes how to configure exchange manager with Azure Blob Storage. + +To set up exchange manager with Azure Blob Storage as spooling destination, you need three required properties in `exchange-manager.properties` file. + +\|Property\|Description\| +\|-\|-\| +\|`exchange-manager.name`\| Kind of storage that is used for spooled data.\| +\|`exchange.base-directories`\| Comma-separated list of URI locations that are used by exchange manager to store spooled data.\| +\|`exchange.azure.connection-string`\| Connection string property used to access the directories specified in `exchange.base-directories`. \| ++ +> [!TIP] +> You need to add `exchange-manager.properties` file in `common` component inside `serviceConfigsProfiles.serviceName[ΓÇ£trinoΓÇ¥]` section in the cluster ARM template. Refer to [manage confgurations](./trino-service-configuration.md#using-arm-template) on how to add configuration files to your cluster. + +Example: + +```json +exchange-manager.name=filesystem +exchange.base-directories=abfs://container_name@account_name.dfs.core.windows.net +exchange.azure.connection-string=connection-string +``` + +The connection string takes the following form: +``` +DefaultEndpointsProtocol=https;AccountName=<account-name>;AccountKey=<account-key>;EndpointSuffix=core.windows.net +``` ++ +You can find the connection string in Security + Networking -> Access keys section in the Azure portal page for your storage account, as shown in the following example: ++ +> [!NOTE] +> HDInsight on AKS Trino currently does not support MSI authentication in exchange manager set up backed by Azure Blob Storage.
hdinsight-aks	Trino Jvm Configuration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-jvm-configuration.md	+ + Title: Modifying JVM heap settings +description: How to modify initial and max heap size for Trino pods. ++ Last updated : 08/29/2023++ +# Configure JVM heap size ++ +This article describes how to modify initial and max heap size for HDInsight on AKS Trino pods. + +`Xms` and `-Xmx` settings can be changed to control initial and max heap size of Trino pods. You can modify the JVM heap settings using ARM template. + +> [!NOTE] +> In HDInsight on AKS, Heap settings on Trino pods are already right-sized based on the selected SKU size. These settings should only be modified when a user wants to control JVM behavior on the pods and is aware of side-effects of changing these settings. + +## Prerequisites +* An operational HDInsight on AKS Trino cluster. +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Review complete cluster [ARM template](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) sample. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +## Example + +Specify configurations in `serviceConfigsProfiles.serviceName[ΓÇ£trinoΓÇ¥]` for `common`, `coordinator` or `worker` components under `properties.clusterProfile` section in the ARM template. + +The following example sets the initial and max heap size to 6 GB and 10 GB for both Trino worker and coordinator. + +```json +"serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "common", + "files": [ + { + "fileName": "jvm.config", + "values": { + "-Xms": "6G", + "-Xmm": "10G" + } + } + ] + } + ] + } +] + +``` + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). + +## Constraints on heap settings + + * Heap sizes can vary between `50%` to `90%` of the allocatable pod resources, any value outside of this boundary results in failed deployment. + * Permissible units for heap settings are MB: `m` & `M`, GB: `g` & `G` and KB: `k` and `K`.
hdinsight-aks	Trino Miscellaneous Files	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-miscellaneous-files.md	+ + Title: Using miscellaneous files +description: Using miscellaneous files with Trino clusters in HDInsight on AKS ++ Last updated : 08/29/2023++ +# Using miscellaneous files ++ +This article provides details on how to specify and use miscellaneous files configuration. + +You can add the configurations for using miscellaneous files in your cluster using ARM template. For broader examples, refer to [Service configuration](./trino-service-configuration.md). + +## Prerequisites +* An operational HDInsight on AKS Trino cluster. +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Review complete cluster [ARM template](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) sample. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +## Add miscellaneous files + +Each file specification in `miscfiles` component under `clusterProfile.serviceConfigsProfiles` in the ARM template requires: + +* `fileName`: Symbolic name of the file to use as a reference in other configurations. This name isn't a physical file name. To use given miscellaneous file in other configurations, specify `${MISC:\<fileName\>}` and HDInsight on AKS Trino substitutes this tag with actual file path at runtime provided value must satisfy the following conditions: + * Contain no more than 253 characters + * Contain only lowercase alphanumeric characters, `-` or `.` + * Start and end with an alphanumeric character + +* `path`: Relative file path including file name and extension if applicable. HDInsight on AKS Trino only guarantees location of each given miscellaneous file relative to other miscellaneous files that is, base directory may change. You can't assume anything about absolute path of miscellaneous files, except that it ends with value specified in ΓÇ£pathΓÇ¥ property. + +* `content`: JSON escaped string with file content. The format of the content is specific to certain Trino functionality and may vary, for example, json for [resource groups](https://trino.io/docs/current/admin/resource-groups.html). + +> [!NOTE] +> Misconfiguration may prevent Trino cluster from starting. Be careful in adding the configurations. + +The following example demonstrates +* Add sample [resource groups](https://trino.io/docs/current/admin/resource-groups.html) json and configure coordinator to use it. + +```json +"serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "common", + "files": [ + { + "fileName": "resource-groups.properties", + "values": { + "resource-groups.configuration-manager": "file", + "resource-groups.config-file": "${MISC:resource-groups}" + } + } + ] + }, + { + "component": "miscfiles", + "files": [ + { + "fileName": "resource-groups", + "path": "/customDir/resource-groups.json", + "content": "{\"rootGroups\":[{\"name\":\"global\",\"softMemoryLimit\":\"80%\",\"hardConcurrencyLimit\":100,\"maxQueued\":1000,\"schedulingPolicy\":\"weighted\",\"jmxExport\":true,\"subGroups\":[{\"name\":\"data_definition\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":5,\"maxQueued\":100,\"schedulingWeight\":1},{\"name\":\"adhoc\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":50,\"maxQueued\":1,\"schedulingWeight\":10,\"subGroups\":[{\"name\":\"other\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":2,\"maxQueued\":1,\"schedulingWeight\":10,\"schedulingPolicy\":\"weighted_fair\",\"subGroups\":[{\"name\":\"${USER}\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":1,\"maxQueued\":100}]}]}]},{\"name\":\"admin\",\"softMemoryLimit\":\"100%\",\"hardConcurrencyLimit\":50,\"maxQueued\":100,\"schedulingPolicy\":\"query_priority\",\"jmxExport\":true}],\"selectors\":[{\"group\":\"global.adhoc.other.${USER}\"}],\"cpuQuotaPeriod\":\"1h\"}" + } + ] + } + ] + } + +``` + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). +
hdinsight-aks	Trino Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-overview.md	+ + Title: What is Trino? (Preview) +description: An introduction to Trino. ++ Last updated : 08/29/2023++ +# What is Trino? (Preview) ++ +[Trino](https://trino.io/docs/current/overview.html) (formerly PrestoSQL) is an open-source distributed SQL query engine for federated and interactive analytics against heterogeneous data sources. It can query data at scale (gigabytes to petabytes) from multiple sources to enable enterprise-wide analytics. + +Trino is used for a wide range of analytical use cases and is an excellent choice for interactive and ad-hoc querying. + +Some of the key features that Trino offers - + +* An adaptive multi-tenant system capable of concurrently running hundreds of memory, I/O, and CPU-intensive queries, and scaling to thousands of worker nodes while efficiently utilizing cluster resources. +* Extensible and federated design to reduce the complexity of integrating multiple systems. +* High performance, with several key related features and optimizations. +* Fully compatible with Hadoop ecosystem. + +There are two types of Trino servers: coordinators and workers. + +## Coordinator + +The Trino coordinator is the server that is responsible for parsing statements, planning queries, and managing Trino worker nodes. It's the ΓÇ£brainΓÇ¥ of a Trino installation and is also the node to which a client connects to submit statements for execution. The coordinator keeps track of the activity on each worker and coordinates the execution of a query. The coordinator creates a logical model of a query, which involves a series of stages, which is translated into a series of connected tasks runs on a cluster of Trino workers. + +## Worker + +A Trino worker is a server in a Trino installation, which is responsible for executing tasks and processing data. Worker nodes fetch data from connectors and exchange intermediate data with each other. The coordinator is responsible for fetching results from the workers and returning the final results to the client. +++
hdinsight-aks	Trino Query Logging	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-query-logging.md	+ + Title: Query logging +description: Log Query lifecycle events in Trino Cluster ++ Last updated : 08/29/2023++ +# Query logging ++ +Trino supports custom [event listeners](https://trino.io/docs/current/develop/event-listener.html) that can be used to listen for Query lifecycle events. You can author your own event listeners or use a built-in plugin provided by HDInsight on AKS Trino that logs events to Azure Blob Storage. + +You can enable built-in query logging in two ways: + +* You can enable built-in query logging during [Trino cluster creation](./trino-create-cluster.md) by enabling hive catalog. + +* You can enable built-in query logging in your cluster using ARM template. + +This article covers addition of query logging to your cluster using ARM template. + +## Prerequisites + +* An operational HDInsight on AKS Trino cluster. +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Review complete cluster [ARM template](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) sample. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). ++ +## Enable query logging + +To enable the built-in query logging plugin in your Trino cluster, add/update `clusterProfile.trinoProfile.userTelemetrySpec` section with the following properties in your cluster ARM template. + +\| Property\| Description\| +\|\|\| +\|`path`\|Fully qualified path to a directory used as a root to capture different query logs.\| +\|`hivecatalogName`\| This catalog is used to mount external tables on the files written in storage account. This catalog must be added in your cluster, [Add hive catalog](trino-connect-to-metastore.md).\| +\|`hivecatalogSchema`\|Query logging plugin uses this schema to mount the external table for the logs, plugin creates this schema if it doesn't exist already. Default value - `trinologs`\| +\|`partitionRetentionInDays`\|Query logging plugin prunes the partitions in the log tables, which are older than the specified configuration. Default value - `365`\| + +The following example demonstrates how a query logging is enabled in a Trino cluster. Add this sample json under `[].properties.clusterProfile` in the ARM template. + + ```json + "trinoProfile": { + "userTelemetrySpec": { + "storage": { + "path": "https://querylogstorageaccount.blob.core.windows.net/logs/trinoquerylogs", + "hivecatalogName": "hive", + "hivecatalogSchema": "trinologs", + "partitionRetentionInDays": 365 + } + } + } + ``` + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal). ++ +> [!NOTE] +> +> Plugin uses user-assigned managed identity (MSI) tied to the cluster to authenticate against the storage, please add `Contributor` and `Storage Blob Data Owner` access to the MSI to ensure plugin can write logs to the storage account. <br>User-assigned MSI name is listed in the `msiResourceId` property in the cluster's resource JSON. Learn how to [assign a role](/azure/role-based-access-control/role-assignments-portal#step-2-open-the-add-role-assignment-page). +> +> * PartitionRetentionInDays only removes the metadata partition from the mounted table, it doesn't delete the data. Please clean up the data as per your requirements if not needed anymore. + + ## Metadata management + +If the user specifies a catalog name in `hiveCatalogName` property, plugin mounts the logs files written in storage account as external tables and views, which can be queried through Trino. + +The plugin creates three tables and three views, which can be used to query the lifecycle events (`QueryCompletedEvent`, `QueryCreatedEvent`, and `SplitCompletedEVent`). These tables & views are created under the catalog and schema provided as user input. + +Name of tables: +- `querycompleted`: Contains `QueryCompleted` events fired by Trino. +- `querycreated`: Contains `QueryCreatedEvents` fired by Trino. +- `splitcompleted`: Contains `SplitCompletedEvents` fired by Trino. + +Name of views: + - `vquerycompleted` + - `vquerycreated` + - `vsplitcompleted` + + > [!NOTE] + > + > Users are encouraged to use the views as they are immune to underlying schema changes and account for table described. + +## Table archival +The plugin supports archiving (N-1)th table in the scenario where user decides to the change the `path` or external location of the logs. +If that happens, plugin renames the table pointing to the old path as <table_name>_archived, the view created will union the result of current and the archived tables in this scenario. + +## Create your custom plugin +You can also author a custom event listener plugin, follow the directions on [docs](https://trino.io/docs/current/develop/event-listener.html#implementation), Deploy custom plugins by following [plugin deployment steps](trino-custom-plugins.md).
hdinsight-aks	Trino Scan Stats	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-scan-stats.md	+ + Title: Use scan statistics +description: How to enable, understand and query scan statistics using query log tables for Trino clusters for HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Enable scan statistics for queries ++ +Often data teams are required to investigate performance or optimize queries to improve resource utilization or meet business requirements. + +A new capability has been added in HDInsight on AKS Trino that allows user to capture Scan statistics for any connector. This capability provides deeper insights into query performance profile beyond what is available in statistics produced by Trino. + +You can enable this feature using [session property](https://trino.io/docs/current/sql/set-session.html#session-properties.) `collect_raw_scan_statistics`, and by following Trino command: +``` +SET SESSION collect_raw_scan_statistics=true +``` + +Once enabled, source operators in the query like `TableScanOperator`, `ScanFilterAndProject` etc. have statistics on data scans, the granularity is per operator instance in a pipeline. + +> [!TIP] +> Scan stats are helpful in identifying bottlenecks when the cluster or query is not CPU constrained, and read performance of the query needs investigation. It also helps to understand the execution profile of the query from a split level perspective. + +> [!NOTE] +> Currently, captured number of splits per worker is limited to 1000 due to size constraints of produced data. If the number of splits per worker for the query exceeds this limit, top 1000 longest running splits are returned. + +## How to access scan statistics + +Once the session property is set, subsequent queries in the session start capturing statistics from source operators whenever they're available. There are multiple ways users can consume and analyze scan statistics generated for a query. + +Query Json + +The Json tab on Query details page provides the JSON representation of query, which included statistics on every stage, pipeline of the query. When the session property is set, the json includes a new field called `scanStats` in `queryStats.operatorSummaries[]`. The array contains one object per instance of operator. + +The following example shows a json for a query using `hive connector` and scan statistics enabled. +> [!NOTE] +> The scan statistics summary includes splitInfo which is populated by the connector. This allows users to control what information about the store they would like to include in their custom connectors. ++ +Scan Statistics UI* + +You can find a new tab called `Scan Stats` in Query details page that visualizes the statistics produced by this feature and provides insights in split grain performance of each worker. The page allows users to view trino's execution profile for the query with information like, concurrent reads over time and throughput. ++ +The following example shows a page for a query with scan statistics enabled. ++ +## Using Microsoft Query logger** + +Microsoft [Query logger](./trino-query-logging.md) has builtin support for this feature. When enabled with this feature, the query logger plugin populates a table called `operatorstats` along with the query events table, this table is denormalized so that every operator instance is one row for each query.
hdinsight-aks	Trino Service Configuration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-service-configuration.md	+ + Title: Trino cluster configuration +description: How to perform service configuration for Trino clusters for HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Trino configuration management ++ +HDInsight on AKS Trino cluster comes with most of the default configurations of open-source Trino. This article describes how to update config files, and adds your own supplemental config files to the cluster. + +You can add/update the configurations in two ways: + +* [Using Azure portal](#using-azure-portal) +* [Using ARM template](#using-arm-template) + +> [!NOTE] +> HDInsight on AKS Trino enforces certain configurations and prohibits modification of some files and/or properties. This is done to ensure proper security/connectivity via configuration. Example of prohibited files/properties includes, but is not limited to: +> * jvm.config file with the exception of Heap size settings. +> * Node.properties: node.id, node.data-dir, log.path etc. +> * `Config.properties: http-server.authentication., http-server.https. etc.` ++ +## Using Azure portal + +In the Azure portal, you can modify three sets of standard [Trino configurations](https://trino.io/docs/current/admin/properties.html): +* log.properties +* config.properties +* node.properties + +Follow the steps to modify the configurations: + +1. Sign in to [Azure portal](https://portal.azure.com). + +2. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/trino-service-configuration/portal-search.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster."::: + +3. Select your cluster name from the list page. + + :::image type="content" source="./media/trino-service-configuration/portal-search-result.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list."::: + +4. Navigate to "Configuration Management" blade. + + :::image type="content" source="./media/trino-service-configuration/azure-portal-configuration-management.png" alt-text="Screenshot showing Azure portal configuration management."::: + +6. Add new or update the existing key value pairs for the configurations you want to modify. For example, config.properties -> Custom configurations -> click "Add" to add new configuration setting and then click Ok. + + :::image type="content" source="./media/trino-service-configuration/configuration-properties.png" alt-text="Screenshot showing custom configuration."::: + +8. Click "Save" to save the configurations. + + :::image type="content" source="./media/trino-service-configuration/save-configuration.png" alt-text="Screenshot showing how to save the configuration."::: + +## Using ARM template + +### Prerequisites + +* An operational HDInsight on AKS Trino cluster. +* Create [ARM template](../create-cluster-using-arm-template-script.md) for your cluster. +* Review complete cluster [ARM template](https://hdionaksresources.blob.core.windows.net/trino/samples/arm/arm-trino-config-sample.json) sample. +* Familiarity with [ARM template authoring and deployment](/azure/azure-resource-manager/templates/overview). + +### Cluster management + +All HDInsight on AKS Trino configurations can be specified in `serviceConfigsProfiles.serviceName[ΓÇ£trinoΓÇ¥]` under `properties.clusterProfile`. + +The following example focuses on `coordinator/worker/miscfiles`. For catalogs, see [Add catalogs to an existing cluster](trino-add-catalogs.md): + +```json +"serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "catalogs", + "files": [<file-spec>,ΓÇª] + }, + { + "component": "coordinator", + "files": [<file-spec>,ΓÇª] + }, + { + "component": "worker", + "files": [<file-spec>,ΓÇª] + }, + { + "component": " miscfiles", + "files": [<file-spec>,ΓÇª] + }, + ] + } +] +``` + +There are several components that control different configuration aspects: + +\|Component name\|Required/allowed properties for each file spec\|Description\| +\|\|\|\| +\|common\|`filename`, `values`\|Contains config files for both coordinator and worker.\| +\|coordinator\|`filename`, `values`\|Contains config files for coordinator only, overrides common if present.\| +\|worker\|`filename`, `values`\|Contains config files for workers only, overrides common if present.\| +\|`miscfiles`\|`filename`, `content`\|Contains miscellaneous configuration files supplied by user for entire cluster.\| +\|catalogs\|`filename`, either content or values\|Contains catalog files for entire cluster.\| + +The following example demonstrates: +* Override default node.environment for cluster (displayed in Trino UI). +* Override default config.properties values for coordinator and worker. +* Add sample [resource groups](https://trino.io/docs/current/admin/resource-groups.html) json and configure coordinator to use it. + +```json +"serviceConfigsProfiles": [ + { + "serviceName": "trino", + "configs": [ + { + "component": "common", + "files": [ + { + "fileName": "node.properties", + "values": { + "node.environment": "preview" + } + }, + { + "fileName": "config.properties", + "values": { + "join-distribution-type": "AUTOMATIC", + "query.max-execution-time": "5d", + "shutdown.grace-period": "5m" + } + } + ] + }, + { + "component": "coordinator", + "files": [ + { + "fileName": "resource-groups.properties", + "values": { + "resource-groups.configuration-manager": "file", + "resource-groups.config-file": "${MISC:resource-groups}" + } + } + ] + }, + { + "component": "miscfiles", + "files": [ + { + "fileName": "resource-groups", + "path": "/customDir/resource-groups.json", + "content": "{\"rootGroups\":[{\"name\":\"global\",\"softMemoryLimit\":\"80%\",\"hardConcurrencyLimit\":100,\"maxQueued\":1000,\"schedulingPolicy\":\"weighted\",\"jmxExport\":true,\"subGroups\":[{\"name\":\"data_definition\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":5,\"maxQueued\":100,\"schedulingWeight\":1},{\"name\":\"adhoc\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":50,\"maxQueued\":1,\"schedulingWeight\":10,\"subGroups\":[{\"name\":\"other\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":2,\"maxQueued\":1,\"schedulingWeight\":10,\"schedulingPolicy\":\"weighted_fair\",\"subGroups\":[{\"name\":\"${USER}\",\"softMemoryLimit\":\"10%\",\"hardConcurrencyLimit\":1,\"maxQueued\":100}]}]}]},{\"name\":\"admin\",\"softMemoryLimit\":\"100%\",\"hardConcurrencyLimit\":50,\"maxQueued\":100,\"schedulingPolicy\":\"query_priority\",\"jmxExport\":true}],\"selectors\":[{\"group\":\"global.adhoc.other.${USER}\"}],\"cpuQuotaPeriod\":\"1h\"}" + } + ] + } + ] + } + +``` + +Deploy the updated ARM template to reflect the changes in your cluster. Learn how to [deploy an ARM template](/azure/azure-resource-manager/templates/deploy-portal).
hdinsight-aks	Trino Superset	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-superset.md	+ + Title: Use Apache Superset with HDInsight on AKS Trino +description: Deploying Superset and connecting to HDInsight on AKS Trino ++ Last updated : 08/29/2023++ +# Deploy Apache Superset ++ +Visualization is essential to effectively explore, present, and share data. [Apache Superset](https://superset.apache.org/) allows you to run queries, visualize, and build dashboards over your data in a flexible Web UI. + +This article describes how to deploy an Apache Superset UI instance in Azure and connect it to HDInsight on AKS Trino cluster to query data and create dashboards. + +Summary of the steps covered in this article: +1. [Prerequisites](#prerequisites). +2. [Create Kubernetes cluster for Apache Superset](#create-kubernetes-cluster-for-apache-superset). +3. [Deploy Apache Superset](#deploy-apache-superset). ++ +## Prerequisites + +If using Windows, use [Ubuntu on WSL2](https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-11-with-gui-support#1-overview) to run these instructions in a bash shell Linux environment within Windows. Otherwise, you need to modify commands to work in Windows. + +### Create a HDInsight on AKS Trino cluster and assign a Managed Identity + +1. If you haven't already, create an [HDInsight on AKS Trino cluster](trino-create-cluster.md). + +2. For Apache Superset to call Trino, it needs to have a managed identity (MSI). Create or pick an existing [user assigned managed identity](/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp#create-a-user-assigned-managed-identity). + +3. Modify your Trino cluster configuration to allow the managed identity created in step 2 to run queries. [Learn how to manage access](../hdinsight-on-aks-manage-authorization-profile.md). + + +### Install local tools + +1. Setup Azure CLI. + + a. Install [Azure CLI](/cli/azure/install-azure-cli-linux?pivots=apt). + + b. Log in to the Azure CLI: `az login`. + + c. Install Azure CLI preview extension. + + ```ssh + # Install the aks-preview extension + az extension add --name aks-preview + + # Update the extension to make sure you've the latest version installed + az extension update --name aks-preview + ``` + +4. Install [Kubernetes](https://kubernetes.io/docs/tasks/tools/). + +5. Install [Helm](https://helm.sh/docs/intro/install/). ++ +## Create kubernetes cluster for Apache Superset + +This step creates the Azure Kubernetes Service (AKS) cluster where you can install Apache Superset. You need to bind the managed identity you've associated to the cluster to allow the Superset to authenticate with Trino cluster with that identity. + +1. Create the following variables in bash for your Superset installation. + + ```ssh + # -- Parameters + + # The subscription ID where you want to install Superset + SUBSCRIPTION= + # Superset cluster name (visible only to you) + CLUSTER_NAME=trinosuperset + # Resource group containing the Azure Kubernetes service + RESOURCE_GROUP_NAME=trinosuperset + # The region to deploy Superset (ideally same region as Trino): to list regions: az account list-locations REGION=westus3 + # The resource path of your managed identity. To get this resource path: + # 1. Go to the Azure Portal and find your user assigned managed identity + # 2. Select JSON View on the top right + # 3. Copy the Resource ID value. + MANAGED_IDENTITY_RESOURCE= + ``` ++ +2. Select the subscription where you're going to install Superset. + + ```ssh + az account set --subscription $SUBSCRIPTION + ``` + +3. Enable pod identity feature on your current subscription. + + ```ssh + az feature register --name EnablePodIdentityPreview --namespace Microsoft.ContainerService + az provider register -n Microsoft.ContainerService + ``` + +4. Create an AKS cluster to deploy Superset. + + ```ssh + # Create resource group + az group create --location $REGION --name $RESOURCE_GROUP_NAME + + # Create AKS cluster + az \ + aks create \ + -g $RESOURCE_GROUP_NAME \ + -n $CLUSTER_NAME \ + --node-vm-size Standard_DS2_v2 \ + --node-count 3 \ + --enable-managed-identity \ + --assign-identity $MANAGED_IDENTITY_RESOURCE \ + --assign-kubelet-identity $MANAGED_IDENTITY_RESOURCE + + # Set the context of your new Kubernetes cluster + az aks get-credentials --resource-group $RESOURCE_GROUP_NAME --name $CLUSTER_NAME + ``` + +## Deploy Apache Superset + +1. To allow Superset to talk to Trino cluster securely, the easiest way is to set up Superset to use the Azure Managed Identity. This step means that your cluster uses the identity you've assigned it without manual deployment or cycling of secrets. + + You need to create a values.yaml file for the Superset Helm deployment. Refer [sample code](https://github.com/Azure-Samples/hdinsight-aks/blob/main/src/trino/deploy-superset.yml). + + Optional: use Microsoft Azure Postgres instead of using the Postgres deployed inside the Kubernetes cluster. + + Create an "Azure Database for PostgreSQL" instance to allow easier maintainence, allow for backups, and provide better reliability. + + ```yaml + postgresql: + enabled: false + + supersetNode: + connections: + db_host: '{{SERVER_NAME}}.postgres.database.azure.com' + db_port: '5432' + db_user: '{{POSTGRES_USER}}' + db_pass: '{{POSTGRES_PASSWORD}}' + db_name: 'postgres' # default db name for Azure Postgres + ``` + +2. Add other sections of the values.yaml if necessary. [Superset documentation](https://superset.apache.org/docs/installation/running-on-kubernetes/) recommends changing default password. + +3. Deploy Superset using Helm. + + ```ssh + # Verify you have the context of the right Kubernetes cluster + kubectl cluster-info + # Add the Superset repository + helm repo add superset https://apache.github.io/superset + # Deploy + helm repo update + helm upgrade --install --values values.yaml superset superset/superset + ``` +4. Connect to Superset and create connection. + + > [!NOTE] + > You should create separate connections for each Trino catalog you want to use. + + 1. Connect to Superset using port forwarding. + + `kubectl port-forward service/superset 8088:8088 --namespace default` + + 2. Open a web browser and go to http://localhost:8088/. If you didn't change the administrator password, login using username: admin, password: admin. + + 3. Select "connect database" from the plus '+' menu on the right hand side. + + :::image type="content" source="./media/trino-superset/connect-database.png" alt-text="Screenshot showing connect database."::: + + 5. Select Trino. + + 6. Enter the SQL Alchemy URI of your HDInsight on AKS Trino cluster. + + You need to modify three parts of this connection string: + + \|Property\|Example\|Description\| + \|-\|-\|-\| + \|user\|trino@\|The name before the @ symbol is the username used for connection to Trino.\| + \|hostname\|mytrinocluster.00000000000000000000000000<br>.eastus.hdinsightaks.net\|The hostname of your Trino cluster. <br> You can get this information from "Overview" page of your cluster in the Azure portal.\| + \|catalog\|/tpch\|After the slash, is the default catalog name. <br> You need to change this catalog to the catalog that has the data you want to visualize.\| + + trino://<mark>$USER</mark>@<mark>$TRINO_CLUSTER_HOST_NAME</mark>.hdinsightaks.net:443/<mark>$DEFAULT_CATALOG</mark> + + Example: `trino://trino@mytrinocluster.00000000000000000000000000.westus3.hdinsightaks.net:443/tpch` + + :::image type="content" source="./media/trino-superset/sql-alchemy-uri.png" alt-text="Screenshot showing connection string."::: + + 8. Select the "Advanced" tab and enter the following configuration in "Additional Security." Replace the client_id value with the GUID Client ID for your managed identity (this value can be found in your managed identity resource overview page in the Azure portal). + + ```yaml + { + "auth_method": "azure_msi", + "auth_params": + { + "scope": "https://clusteraccess.hdinsightaks.net/.default", + "client_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" + } + } + ``` + + :::image type="content" source="./media/trino-superset/advanced-credentials.png" alt-text="Screenshot showing adding MSI."::: + + 10. Select "Connect." + +Now, you're ready to create datasets and charts. + +### Troubleshooting + +* Verify your Trino cluster has been configured to allow the Superset cluster's user assigned managed identity to connect. You can verify this value by looking at the resource JSON of your Trino cluster (authorizationProfile/userIds). Make sure that you're using the identity's object ID, not the client ID. + +* Make sure there are no mistakes in the connection configuration. + 1. Make sure the "secure extra" is filled out, + 2. Your URL is correct. + 3. Use the `tpch` catalog to test with to verify your connection is working before using your own catalog. + +## Next Steps + +To expose Superset to the internet, allow user login using Azure Active Directory you need to accomplish the following general steps. These steps require an intermediate or greater experience with Kubernetes. + +* [Configure Azure Active Directory OAuth2 login for Superset](./configure-azure-active-directory-login-for-superset.md)
hdinsight-aks	Trino Ui Command Line Interface	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-ui-command-line-interface.md	+ + Title: Trino CLI +description: Using Trino via CLI ++ Last updated : 08/29/2023++ +# Trino CLI ++ +The Trino CLI provides a terminal-based, interactive shell for running queries. + +## Install on Windows +For Windows, the Trino CLI is installed via an MSI, which gives you access to the CLI through the Windows Command Prompt (CMD) or PowerShell. When installing for Windows Subsystem for Linux (WSL), see [Install on Linux](#install-on-linux). + +### Requirements + +* [Java 8 or 11](/java/openjdk/install). + +* Add java.exe to PATH or define JAVA_HOME environment variable pointing to JRE installation directory, such that `%JAVA_HOME%\bin\java.exe` exists. + +### Install or update + +The MSI package is used for installing or updating the HDInsight on AKS Trino CLI on Windows. + +Download and install the latest release of the Trino CLI. When the installer asks if it can make changes to your computer, click the "Yes" box. After the installation is complete, you'll need to close and reopen any active Windows Command Prompt or PowerShell windows to use the Trino CLI. + +Download Trino CLI: https://aka.ms/InstallTrinoCLIWindows + +### Run the Trino CLI + +You can now run the Trino CLI using ΓÇ£trino-cliΓÇ¥ in command prompt, and connect to cluster: +```cmd +trino-cli --server <cluster_endpoint> +``` + +> [!NOTE] +> If you run on headless OS (no web browser), Trino CLI will prompt to use device code for authentication. You can also specify command line parameter `--auth AzureDeviceCode` to force using device code. In this case, you need to open a browser on another device/OS, input the code displayed and authenticate, and then come back to CLI. ++ +### Troubleshooting + +Here are some common problems seen when installing the Trino CLI on Windows. + +Proxy blocks connection + +If you can't download the MSI installer because your proxy is blocking the connection, make sure that you have your proxy properly configured. For Windows 10, these settings are managed in the Settings > Network & Internet > Proxy pane. Contact your system administrator for the required settings, or for situations where your machine may be configuration-managed or require advanced setup. + +In order to get the MSI, your proxy needs to allow HTTPS connections to the following addresses: + +* `https://aka.ms/` +* `https://hdionaksresources.blob.core.windows.net/` + +### Uninstall + +You can uninstall the Trino CLI from the Windows "Apps and Features" list. To uninstall: + +\|Platform\|Instructions\| +\|\|\| +\|Windows 10\|Start > Settings > App\| +\|Windows 8 and Windows 7\|Start > Control Panel > Programs > Uninstall a program\| + +Once on this screen, type Trino into the program search bar. The program to uninstall is list as ΓÇ£HDInsight Trino CLI \<version\>.ΓÇ¥ Select this application, then click the Uninstall button. + +## Install on Linux + +The Trino CLI provides a terminal-based, interactive shell for running queries. You may manually install the Trino CLI on Linux by selecting the Install script option. + +### Requirements + +* [Java 8 or 11](/java/openjdk/install). + +* Add java to PATH or define JAVA_HOME environment variable pointing to JRE installation directory, such that $JAVA_HOME/bin/java exists. + +### Install or update + +Both installing and updating the CLI requires rerunning the install script. Install the CLI by running curl. + +```bash +curl -L https://aka.ms/InstallTrinoCli \| bash +``` + +The script can also be downloaded and run locally. You may have to restart your shell in order for changes to take effect. + +### Run the Trino CLI + +You can now run the Trino CLI with the ΓÇ£trino-cliΓÇ¥ command from the shell, and connect to the cluster: +```bash +trino-cli --server <cluster_endpoint> +``` + +> [!NOTE] +> If you run on headless OS (no web browser) Trino CLI will prompt to use device code for authentication. You can also specify command line parameter `--auth AzureDeviceCode` to force using device code. In this case you need to open a browser on another device/OS, input the code displayed and authenticate, and then come back to CLI. + +### Troubleshooting + +Here are some common problems seen during a manual installation. + +curl "Object Moved" error + +If you get an error from curl related to the -L parameter, or an error message including the text "Object Moved," tries using the full URL instead of the aka.ms redirect: + +```bash +curl https://hdionaksresources.blob.core.windows.net/trino/cli/install.sh \| bash +``` + +trino-cli command not found + +```bash +hash -r +``` + +The issue can also occur if you didn't restart your shell after installation. Make sure that the location of the trino-cli command ($HOME/bin) is in your $PATH. + +Proxy blocks connection + +In order to get the installation scripts, your proxy needs to allow HTTPS connections to the following addresses: + +* `https://aka.ms/` +* `https://hdionaksresources.blob.core.windows.net/` + +Uninstall + +To remove all trino-cli files run: + +```bash +rm $HOME/bin/trino-cli +rm -r $HOME/lib/trino-cli +``` + +## Authentication +Trino CLI supports various methods of Azure Active Directory authentication using command line parameters. The following table describes the important parameters and authentication methods, for more information, see [Authentication](./trino-authentication.md). + +Parameters description available in CLI as well: +```bash +trino-cli --help +``` + +\|Parameter\|Meaning\|Required\|Description\| +\|-\|-\|-\|-\| +\|auth\|Name of authentication method\|No\|Determines how user credentials are provided. If not specified, uses `AzureDefault`.\| +\|azure-client\|Client ID\|Yes for `AzureClientSecret, AzureClientCertificate`.\|Client ID of service principal/application.\| +\|azure-tenant\|Tenant ID\|Yes for `AzureClientSecret, AzureClientCertificate`.\|Azure Active Directory Tenant ID.\| +\|azure-certificate-path\|File path to certificate\|Yes for `AzureClientCertificate`.\|Path to pfx/pem file with certificate.\| +\|azure-use-token-cache\|Use token cache or not\|No\|If provided, access token is cached and reused in `AzureDefault, AzureInteractive, AzureDeviceCode` modes.\| +\|azure-scope\|Token scope\|No\|Azure Active Directory scope string to request a token with.\| +\|use-device-code\|Use device code method or not\|No\|Equivalent to `--auth AzureDeviceCode`.\| +\|password\|Client secret for service principal\|Yes for `AzureClientSecret`.\|Secret/password for service principal when using `AzureClientSecret` mode.\| +\|access-token\|JWT access token\|No\|If access token obtained externally, can be provided using this parameter. In this case, `auth` parameter isn't allowed.\| + +## Examples +\|Description\|CLI command\| +\|-\|-\| +AzureDefault\|`trino-cli --server cluster1.pool1.region.projecthilo.net` +Interactive browser authentication\|`trino-cli --server cluster1.pool1.region1.projecthilo.net --auth AzureInteractive` +Use token cache\|`trino-cli --server cluster1.pool1.region1.projecthilo.net --auth AzureInteractive --azure-use-token-cache` +Service principal with secret\|`trino-cli --server cluster1.pool1.region1.projecthilo.net --auth AzureClientSecret --azure-client 11111111-1111-1111-1111-111111111111 --azure-tenant 11111111-1111-1111-1111-111111111111 --password` +Service principal and protected certificate (password is prompted)\|`trino-cli --server cluster1.pool1.region1.projecthilo.net --auth AzureClientCertificate --azure-client 11111111-1111-1111-1111-111111111111 --azure-certificate-path d:\tmp\cert.pfx --azure-tenant 11111111-1111-1111-1111-111111111111 --password` + +## Troubleshoot +### MissingAccessToken or InvalidAccessToken +CLI shows either of errors: + +```Output +Error running command: Authentication failed: { + "code": "MissingAccessToken", + "message": "Unable to find the token or get the required claims from it." +} +``` + +```Output +Error running command: Error starting query at https://<cluster-endpoint>/v1/statement returned an invalid response: JsonResponse{statusCode=500, statusMessage=, headers={content-type=[application/json; charset=utf-8], date=[Fri, 16 Jun 2023 18:25:23 GMT], strict-transport-security=[max-age=15724800; includeSubDomains]}, hasValue=false} [Error: { + "code": "InvalidAccessToken", + "message": "Unable to find the token or get the required claims from it" +}] +``` + +To resolve the issue, try the following steps: +1. Exit Trino CLI. +2. Run ```az logout``` +3. Run ```az login -t <your-trino-cluster-tenantId>``` +4. Now this command should work: +```bash +trino-cli --server <cluster-endpoint> +``` +5. Alternatively specify auth/tenant parameters: +```bash +trino-cli --server <cluster-endpoint> --auth AzureInteractive --azure-tenant <trino-cluster-tenantId> +``` + +### 403 Forbidden +CLI shows error: +```Output +Error running command: Error starting query at https://<cluster-endpoint>/v1/statement returned an invalid response: JsonResponse{statusCode=403, statusMessage=, headers={content-length=[146], content-type=[text/html], date=[Wed, 25 May 2023 16:49:24 GMT], strict-transport-security=[max-age=15724800; includeSubDomains]}, hasValue=false} [Error: <html> +<head><title>403 Forbidden</title></head> +<body> +<center><h1>403 Forbidden</h1></center> +<hr><center>nginx</center> +</body> +</html> +] +``` +To resolve the issue, add user or group to the [authorization profile](../hdinsight-on-aks-manage-authorization-profile.md).
hdinsight-aks	Trino Ui Dbeaver	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-ui-dbeaver.md	+ + Title: Trino with DBeaver +description: Using Trino in DBeaver. ++ Last updated : 08/29/2023++ +# Connect and query with DBeaver ++ +It's possible to use JDBC driver with many available database tools. This article demonstrates how to configure one of the most popular tool DBeaver to connect to HDInsight on AKS Trino cluster in few simple steps. + +## Prerequisites + +* [Download and install DBeaver](https://dbeaver.io/download/). +* [Install HDInsight on AKS Trino CLI with JDBC driver](./trino-ui-command-line-interface.md#install-on-windows). + +## Configure DBeaver to use HDInsight on AKS Trino JDBC driver + +Open DBeaver and from the main menu, select Database -> Driver Manager. + + > [!NOTE] + > DBeaver comes with existing open-source Trino driver, create a copy of it and register HDInsight on AKS Trino JDBC driver. + + 1. Select Trino driver from list and click Copy. + + * Update Driver Name, for example, "Azure Trino" or "Azure HDInsight on AKS Trino" or any other name. + * Make sure Default Port is 443. + + :::image type="content" source="./media/trino-ui-dbeaver/dbeaver-new-driver.png" alt-text="Screenshot showing Create new Azure Trino driver." + + 1. Select Libraries tab. + + 1. Delete all libraries currently registered. + + 1. Click Add File and select [installed](./trino-ui-command-line-interface.md#install-on-windows) HDInsight on AKS Trino JDBC jar file from your local disk. + + > [!NOTE] + > HDInsight on AKS Trino CLI comes with Trino JDBC jar. You can find it in your local disk. + > <br> Reference location example: `C:\Program Files (x86)\Microsoft SDKs\Azure\TrinoCli-0.410.0\lib`. Location may defer if the installation directory or CLI version is different. + + 1. Click Find Class and select ```io.trino.jdbc.TrinoDriver```. + + :::image type="content" source="./media/trino-ui-dbeaver/dbeaver-new-driver-library.png" alt-text="Screenshot showing Select Azure Trino JDBC driver file." + + 1. Click OK and close Driver Manager, the driver is configured to use. + +## Query and browse HDInsight on AKS Trino cluster with DBeaver + +1. Connect to your Trino cluster by clicking New Database Connection in the toolbar. + +1. Select newly registered driver. In this example, "Azure Trino." + + :::image type="content" source="./media/trino-ui-dbeaver/dbeaver-new-connection.png" alt-text="Screenshot showing Create new connection." + +1. Click "Next." On the Main tab update Host with Trino cluster endpoint, which you can find in portal on cluster overview page. + + :::image type="content" source="./media/trino-ui-dbeaver/dbeaver-new-connection-main.png" alt-text="Screenshot showing Create new connection main." + +1. Optionally, select Driver properties tab and set ```azureUseTokenCache=true```. This parameter would reduce number of authentication actions user needs to perform if interactive browser authentication is required. DBeaver initializes several connections for different tool windows in UI, each requiring authentication, and possibly user action. + + :::image type="content" source="./media/trino-ui-dbeaver/dbeaver-new-driver.png" alt-text="Screenshot showing Create new connection - properties." + +1. Click OK, connection is ready. You can click Test Connection or close the window. + +1. Expand connection in Database Navigator on the left to browse catalogs. Browser may open with authentication prompt. + +1. In main menu, click SQL Editor -> New SQL Script and type any Trino query to get started. + + :::image type="content" source="./media/trino-ui-dbeaver/dbeaver-query.png" alt-text="Screenshot showing Query Trino in DBeaver."
hdinsight-aks	Trino Ui Jdbc Driver	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-ui-jdbc-driver.md	+ + Title: Trino JDBC driver +description: Using the Trino JDBC driver. ++ Last updated : 08/29/2023++ +# Trino JDBC driver ++ +HDInsight on AKS Trino provides JDBC driver, which supports Azure Active Directory authentication and adds few parameters for it. + +## Install + +JDBC driver jar is included in the Trino CLI package, [Install HDInsight on AKS Trino CLI](./trino-ui-command-line-interface.md). If CLI is already installed, you can find it on your file system at following path: +> Windows: `C:\Program Files (x86)\Microsoft SDKs\Azure\TrinoCli-<version>\lib` +> +> Linux: `~/lib/trino-cli` + +## Authentication +Trino JDBC driver supports various methods of Azure Active Directory authentication. The following table describes the important parameters and authentication methods. For more information, see [Authentication](./trino-authentication.md). + +\|Parameter\|Meaning\|Required\|Description\| +\|-\|-\|-\|-\| +\|auth\|Name of authentication method\|No\|Determines how user credentials are provided. If not specified, uses `AzureDefault`.\| +\|azureClient\|Client ID of service principal/application\|Yes for `AzureClientSecret, AzureClientCertificate`.\| +\|azureTenant\|Azure Active Directory Tenant ID\|Yes for `AzureClientSecret, AzureClientCertificate`.\| +\|azureCertificatePath\|File path to certificate\|Yes for `AzureClientCertificate`.\|Path to pfx/pem file with certificate.\| +\|azureUseTokenCache\|Use token cache or not\|No\|If provided, access token is cached and reused in `AzureDefault, AzureInteractive, AzureDeviceCode` modes.\| +\|azureScope\|Token scope\|No\|Azure Active Directory scope string to request a token with.\| +\|password\|Client secret for service principal\|Yes for `AzureClientSecret`.\|Secret/password for service principal when using `AzureClientSecret` mode.\| +\|accessToken\|JWT access token\|No\|If access token obtained externally, can be provided using this parameter. In this case, `auth` parameter isn't allowed.\| + +### Example - connection strings + +\|Description\|JDBC connection string\| +\|-\|-\| +\|AzureDefault\|`jdbc:trino://cluster1.pool1.region1.projecthilo.net`\| +\|Interactive browser authentication\|`jdbc:trino://cluster1.pool1.region1.projecthilo.net?auth=AzureInteractive`\| +\|Use token cache\|`jdbc:trino://cluster1.pool1.region1.projecthilo.net?auth=AzureInteractive&azureUseTokenCache=true`\| +\|Service principal with secret\|`jdbc:trino://cluster1.pool1.region1.projecthilo.net?auth=AzureClientSecret&azureTenant=11111111-1111-1111-1111-111111111111&azureClient=11111111-1111-1111-1111-111111111111&password=placeholder`\| + +## Using JDBC driver in Java code + +Locate JDBC jar file and install it into local maven repository: + +```java +mvn install:install-file -Dfile=<trino-jdbc-*.jar> -DgroupId=io.trino -DartifactId=trino-jdbc -Dversion=<trino-jdbc-version> -Dpackaging=jar -DgeneratePom=true +``` + +Download and unpack [sample java code connecting to Trino using JDBC](https://github.com/Azure-Samples/hdinsight-aks/blob/main/src/trino/JdbcSample.tar.gz). See included README.md for details and examples. + +## Using open-source Trino JDBC driver + +You can also obtain access token externally and pass it to [open source Trino JDBC driver](https://trino.io/docs/current/client/jdbc.html), sample java code with this authentication is included in [using JDBC driver in java code section](#using-jdbc-driver-in-java-code).
hdinsight-aks	Trino Ui Web Ssh	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-ui-web-ssh.md	+ + Title: Trino Web SSH +description: Using Trino in Web SSH ++ Last updated : 08/29/2023++ +# Web SSH ++ +This article describes how you can run queries on your Trino cluster using web ssh. + +## Run Web SSH + +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/trino-ui-web-ssh/portal-search.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster."::: + +1. Select your cluster name from the list page. + + :::image type="content" source="./media/trino-ui-web-ssh/portal-search-result.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list."::: + +1. Navigate to "Secure shell (SSH)" blade and click on the endpoint of a pod. You need to authenticate to open the SSH session. + + :::image type="content" source="./media/trino-ui-web-ssh/secure-shell-in-portal.png" alt-text="Screenshot showing secure shell and Web SSH endpoint."::: + +1. Type `trino-cli` command to run Trino CLI and connect to this cluster. Follow the command displayed to authenticate using device code. + + :::image type="content" source="./medi-trino-command-line-interface.png" alt-text="Screenshot showing login to trino CLI using Web SSH."::: + +Now, you can run queries. + +## Next steps +* [Trino CLI](./trino-ui-command-line-interface.md)
hdinsight-aks	Trino Ui	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/trino/trino-ui.md	+ + Title: Trino UI +description: Using Trino UI ++ Last updated : 08/29/2023++ +# Trino UI ++ +This article covers the details around the Trino UI provided for monitoring the cluster nodes and the queries submitted. ++ +1. Sign in to [Azure portal](https://portal.azure.com). + +1. In the Azure portal search bar, type "HDInsight on AKS cluster" and select "Azure HDInsight on AKS clusters" from the drop-down list. + + :::image type="content" source="./media/trino-ui/portal-search.png" alt-text="Screenshot showing search option for getting started with HDInsight on AKS Cluster." border="true" lightbox="./media/trino-ui/portal-search.png"::: + +1. Select your cluster name from the list page. + + :::image type="content" source="./media/trino-ui/portal-search-result.png" alt-text="Screenshot showing selecting the HDInsight on AKS Cluster you require from the list." border="true" lightbox="./media/trino-ui/portal-search-result.png"::: + +1. From the "Overview" page, you can access the Trino UI. + + :::image type="content" source="./media/trino-ui/trino-ui.png" alt-text="Screenshot showing Trino UI." border="true" lightbox="./media/trino-ui/trino-ui.png":::
hdinsight-aks	Versions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/versions.md	+ + Title: Versioning +description: Versioning in HDInsight on AKS. ++ Last updated : 08/29/2023++ +# Azure HDInsight on AKS versions ++ +HDInsight on AKS service has three components, a Resource provider, an Open-source software (OSS), and Controllers that are deployed on a cluster. Microsoft periodically upgrades the images and the aforementioned components to include new improvements and features. + +New HDInsight on AKS version may be created when one or more of the following are true: +* Major or Minor changes or updates to HDInsight on AKS Resource provider functionality. +* Major or Minor releases or updates of Open-source components. +* Major or Minor releases or updates of AKS Infrastructure components. +* Major or Minor changes or updates to underlying operating system. +* Patches or hotfixes for a component part of the package (including the latest security updates and critical bug fixes). + +## Introduction + +Azure HDInsight on AKS has the concept of Cluster pools and Clusters, which tie together essential component versions such as packages and connectors with a specific open-source component. +Each of the version upgrade periodically includes new improvements, features, and patches. +> [!NOTE] +> You should test and validate that your applications run properly when using new patch, minor or major versions. + +Azure HDInsight on AKS uses the standard [Semantic Versioning](https://semver.org/) scheme for each version: + +``` +[major].[minor].[patch] +Examples: + 1.0.1 + 1.0.2 +``` +Each number in the version indicates general compatibility with the previous version + +- Major versions change when incompatible API updates or backwards compatibility may be broken. +- Minor versions change when functionality updates are made that are backwards compatible with the other minor releases (except for new feature additions or core security fixes/platform updates controlled by upstream). +- Patch versions change when backwards compatible bug fixes are made to a minor version. + +> [!IMPORTANT] +> You must aim to run the latest patch release of the minor version you're running. For example, if your production cluster is on `1.0.1`, `1.0.2` is the latest patch version available for the 1.0 series. You should upgrade to `1.0.2` as soon as possible to ensure your cluster is fully patched and supported. + +## Keep your clusters up to date + +To take advantage of the latest HDInsight on AKS features, we recommend regularly migrating your clusters to the latest patch or minor versions. Currently, HDInsight on AKS doesn't support in-place upgrades as part of public preview, where existing clusters are upgraded to newer versions. You need to create a new HDInsight on AKS cluster in your existing cluster pool and migrate your application to use the new cluster with latest minor version or patch. All cluster pools align with the major version, and clusters within the pool align to the same major version, and you can create clusters with subsequent minor or patch versions. + +As part of the best practices, we recommend you to keep your clusters updated on regular basis. + +HDInsight on AKS release happens every 30 to 60 days. It's always good to move to the latest releases as early as possible. The recommended maximum duration for cluster upgrades is less than three months. + +### Sample Scenarios + +In the below sample, we illustrate a lifecycle of version change with HDInsight on AKS. As an example, a cluster running on cluster Pool version 2.0, cluster version 2.3.6 is considered. This is a sample, and all version updates will be available on release notes on an ongoing basis. + +\| Example \| Version Impact \| Release Notes updates (Sample) \| +\|-\|-\|-\| +\| AKS Kubernetes version update \|MS-Minor \| HDInsight on AKS version 2.4.0. This release includes AKS version updated from 1.26.4 to 1.27.4. Clusters need an update.\| +\| Operating system version patches \|MS-Patch \| HDInsight on AKS version 2.4.1. This release includes maintenance patches for the operating system. Clusters need an update. \| +\| Web SSH is now supported for running client tools \|MS-Patch \| HDInsight on AKS version 2.4.2. This release includes support for running client tools on your webssh pods. Clusters need an update. \| +\| Advanced Auto scale with Load based is now added to HDInsight on AKS \|MS-Minor \| HDInsight on AKS version 2.5.0. This release introduces an advanced load based autoscale with more capabilities. Clusters need an update. \| +\| Custom autoscale with Load based autoscale is now available \| MS-Patch \| HDInsight on AKS version 2.5.1. This release includes customization of load based autoscale. Clusters need an update. \| +\| Add Service tag support \|MS-Patch \|HDInsight on AKS version 2.5.2 Starting 2.5.2 release, HDInsight on AKS would add service tag support. Clusters need an update. \| +\| Open-source component minor update \|MS-Minor \| HDInsight on AKS version 2.6.0. Starting 2.6.0 release, HDInsight on AKS would add Open-source component update from 1.x to 1.y Clusters need an update. \| +\| Open-source component upgrade & AKS upgrade, breaking API change \| MS-Major \|HDInsight on AKS version 3.0.1. Starting 3.0.1, Open source component Y has been updated from 1.x to 2.x, and AKS upgraded infrastructure to 2.x; Cluster pools need an update to 3.0, and clusters to 3.0.1. \| ++ +## Versioning using Azure portal + +In the below example, you can observe, how to select the versions on cluster pool and clusters. +The cluster pool always aligns to the major version of the clusters. That is, if you're looking for an update on 2.4.5 version of HDInsight on AKS, you need to use 2.0 version of cluster pool. + + +When creating a HDInsight on AKS cluster or Apache Flink cluster, you can choose the minor.patch version from the supported version list. ++ +The latest supported Open-source component following list as a dropdown for you to get started. ++ +Since HDInsight on AKS exposes and updates a minor version with each regular release, you can now arrange enough tests before upgrade to the new version and control your schedule. +++ +> [!IMPORTANT] +> In case you're using RESTAPI operations, the cluster is always created with the most recent MS-Patch version to ensure you can get the latest security updates and critical bug fixes. + +We're also building in-place upgrade support along with Azure advisor notifications to make the upgrade easier and smooth. + +## Release notes +For release notes on the latest versions of HDInsight on AKS, see [release notes](./release-notes/hdinsight-aks-release-notes.md) + +## Versioning considerations + +* Once a cluster is deployed with a version, that cluster can't automatically upgrade to a newer version. You're required to recreate until in-place upgrade feature is live for minor versions. +* During a new cluster creation, most recent version is deployed or picked. +* Customers should test and validate that applications run properly when using new HDInsight on AKS version. +* HDInsight on AKS reserves the right to change the default version without prior notice. If you have a version dependency, specify the HDInsight on AKS version when you create your clusters. +* HDInsight on AKS may retire an OSS component version before retiring the HDInsight on AKS version, based on the upstream support of open-source or AKS dependencies. ++
hdinsight-aks	Virtual Machine Recommendation Capacity Planning	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/virtual-machine-recommendation-capacity-planning.md	+ + Title: Azure Virtual Machine recommendations and capacity planning +description: Default and minimum virtual machine size recommendations and capacity planning for HDInsight on AKS. ++ Last updated : 10/05/2023++ +# Default and minimum virtual machine size recommendations and capacity planning for HDInsight on AKS ++ +This article discusses default and recommended node configurations for Azure HDInsight on AKS clusters. + +## Cluster pools +For creating cluster pools, the currently available VM Options are F4s_v2, D4a_v4, D4as_v4, and E4s_v3. + +## Clusters +HDInsight on AKS currently supports Virtual Machines (VMs) from the [Memory Optimized](/azure/virtual-machines/sizes-memory) and [General Purpose](/azure/virtual-machines/sizes-general) categories for cluster creation. Memory optimized VM sizes offer a high Memory-to-CPU ratio, which is great for relational database servers, medium to large caches, and in-memory analytics. General purpose VM sizes provide balanced Memory-to-CPU ratio and are ideal for testing and development, small to medium databases, and low to medium traffic web servers. + +If your use case requires the usage of Memory Optimized VMs, the default recommendation is to use VMs from the families `Eadsv5` or `Easv5`. For use cases requiring the usage of General Purpose VMs, we recommend VMs from the families `Dadsv5` or `Ddsv5`. + +If there is insufficient capacity/quota on the recommended VM Families, look for alternatives in the table that lists all the VM Families currently supported in HDInsight on AKS for cluster creation: + +\| Type \| Virtual machine family \| Temp storage \| Premium storage support \| +\|--\|--\|--\|--\| +\| Memory optimized \| [Eadsv5](/azure/virtual-machines/easv5-eadsv5-series) \| Yes \| Yes \| +\| \| [Easv5](/azure/virtual-machines/easv5-eadsv5-series) \| No \| Yes \| +\| \| [Edsv5](/azure/virtual-machines/edv5-edsv5-series) \| Yes \| Yes \| +\| \| [Edv5](/azure/virtual-machines/edv5-edsv5-series) \| Yes \| No \| +\| \| [Easv4](/azure/virtual-machines/eav4-easv4-series) \| Yes \| Yes \| +\| \| [Eav4](/azure/virtual-machines/eav4-easv4-series) \| Yes \| No \| +\| \| [Edv4](/azure/virtual-machines/edv4-edsv4-series) \| Yes \| No \| +\| General purpose \| [Dadsv5](/azure/virtual-machines/dasv5-dadsv5-series) \| Yes \| Yes \| +\| \| [Ddsv5](/azure/virtual-machines/ddv5-ddsv5-series) \| Yes \| Yes \| +\| \| [Ddv5](/azure/virtual-machines/ddv5-ddsv5-series) \| Yes \| No \| +\| \| [Dasv4](/azure/virtual-machines/dav4-dasv4-series) \| Yes \| Yes \| +\| \| [Dav4](/azure/virtual-machines/dav4-dasv4-series) \| Yes \| No┬á┬á \| + +The minimum VM specifications recommended (regardless of the chosen VM Family for the cluster) is 8vCPUs and 32 GiB RAM. Higher vCPU and GiB variants may be chosen as per the workload being processed. + +## Capacity planning + +The Virtual Machines used in HDInsight on AKS clusters requires the same Quota as Azure VMs. This is unlike the original version of HDInsight, where users had to request a dedicated quota to create HDInsight clusters by selecting 'Azure HDInsight' from the quota selection dropdown as shown in the image. For HDInsight on AKS, customers need to select ΓÇÿComputeΓÇÖ from the Quota selection dropdown in order to request extra capacity for the VMs they intend to use in their clusters. Find detailed instructions for increasing your quota [here](/azure/quotas/per-vm-quota-requests). ++++
hdinsight-aks	Whats New	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight-aks/whats-new.md	+ + Title: What's new in HDInsight on AKS? (Preview) +description: An introduction to new concepts in HDInsight on AKS that aren't in HDInsight. ++ Last updated : 08/31/2023++ +# What's new in HDInsight on AKS? (Preview) ++ +In HDInsight on AKS, all cluster management and operations have native support for [service management](./service-configuration.md) on Azure portal for individual clusters. + +In HDInsight on AKS, two new concepts are introduced: + +* Cluster Pools are used to group and manage clusters. +* Clusters are used for open source computes, they're hosted within a cluster pool. + +## Cluster Pools + +HDInsight on AKS runs on Azure Kubernetes Service (AKS). The top-level resource is the Cluster Pool and manages all clusters running on the same AKS cluster. When you create a Cluster Pool, an underlying AKS cluster is created at the same time to host all clusters in the pool. Cluster pools are a logical grouping of clusters, which helps in building robust interoperability across multiple cluster types and allow enterprises to have the clusters in the same virtual network. Cluster pools provide rapid and cost-effective access to all the cluster types created on-demand and at scale.One cluster pool corresponds to one cluster in AKS infrastructure. + +## Clusters + +Clusters are individual open source compute workloads, such as Apache Spark, Apache Flink, and Trino, which can be created rapidly in few minutes with preset configurations and few clicks. Though running on the same cluster pool, each cluster can have its own configurations, such as cluster type, version, node VM size, node count. Clusters are running on separated compute resources with its own DNS and endpoints. + +## Features currently in preview + +The following table list shows the features of HDInsight on AKS that are currently in preview. Preview features are sorted alphabetically. + +\| Area \| Features \| +\| \| \| +\| Fundamentals \| [Create Pool and clusters](./quickstart-create-cluster.md) using portal, Web secure shell (ssh) support, Ability to Choose number of worker nodes during cluster creation \| +\| Storage \| ADLS Gen2 Storage [support](./cluster-storage.md) \| +\| Metastore \| External Metastore support for [Trino](./trino/trino-connect-to-metastore.md), [Spark](./spark/use-hive-metastore.md) and [Flink](./flink/use-hive-metastore-datastream.md), Integrate with [HDInsight](overview.md#connectivity-to-hdinsight)\| +\| Security \| Support for ARM RBAC, Support for MSI based authentication, Option to provide [cluster access](./hdinsight-on-aks-manage-authorization-profile.md) to other users \| +\| Logging and Monitoring \| Log aggregation in Azure [log analytics](./how-to-azure-monitor-integration.md), for server logs, Cluster and Service metrics via [Managed Prometheus and Grafana](./monitor-with-prometheus-grafana.md), Support server metrics in [Azure monitor](/azure/azure-monitor/overview), Service Status page for monitoring the [Service health](./service-health.md) \| +\| Auto Scale \| Load based [Auto Scale](hdinsight-on-aks-autoscale-clusters.md#create-a-cluster-with-load-based-auto-scale), and Schedule based [Auto Scale](hdinsight-on-aks-autoscale-clusters.md#create-a-cluster-with-schedule-based-auto-scale) \| +\| Customize and Configure Clusters \| Support for [script actions](./manage-script-actions.md) during cluster creation, Support for [library management](./spark/library-management.md), [Service configuration](./service-configuration.md) settings after cluster creation \| +\| Trino \| Support for [Trino catalogs](./trino/trino-add-catalogs.md), [Trino CLI Support](./trino/trino-ui-command-line-interface.md), [DBeaver](./trino/trino-ui-dbeaver.md) support for query submission, Add or remove plugins and [connectors](./trino/trino-connectors.md), Support for [logging query](./trino/trino-query-logging.md) events, Support for [scan query statistics](./trino/trino-scan-stats.md) for any [Connector](./trino/trino-connectors.md) in Trino dashboard, Support for Trino dashboard to monitor queries, [Query Caching](./trino/trino-caching.md), Integration with PowerBI, Integration with [Apache Superset](./trino/trino-superset.md), Redash, Support for multiple [connectors](./trino/trino-connectors.md) \| +\| Flink \| Support for Flink native web UI, Flink support with HMS for [DStream](./flink/use-hive-metastore-datastream.md), Submit jobs to the cluster using [REST API and Azure Portal](./flink/flink-job-management.md), Run programs packaged as JAR files via the [Flink CLI](./flink/use-flink-cli-to-submit-jobs.md), Support for persistent Savepoints, Support for update the configuration options when the job is running, Connecting to multiple Azure +\| Spark \| [Jupyter Notebook](./spark/submit-manage-jobs.md), Support for [Delta lake](./spark/azure-hdinsight-spark-on-aks-delta-lake.md) 2.0, Zeppelin Support, Support ATS, Support for Yarn History server interface, Job submission using SSH, Job submission using SDK and [Machine Learning Notebook](./spark/azure-hdinsight-spark-on-aks-delta-lake.md) \| + +## Coming soon + +\| Feature \| Estimated release timeline \| +\| \| \| +\| Autoscale - Load Based - Trino \| Q1 2024 \| +\| Shuffle aware load based auto scale for Spark \| Q2 2024 \| +\| In Place Upgrade \| Q2 2024 \| +\| Reserved Instance Support \| Q2 2024 \| +\| MSI based authentication for Metastore (SQL) \| Q1 2024 \| +\| Spark 3.4 Support \| Q2 2024 \| +\| Trino 426 Support \| Q1 2024 \| +\| Ranger for RBAC \| Q2 2024 \| +\| App mode support for Flink \| Q2 2024 \| +\| Flink 1.17 Support \| Q1 2024 \| +\| Spark ACID Support \| Q1 2024 \| +\| Configurable SKUs for Headnode, SSH \| Q2 2024 \| +\| Flink SQL Gateway Support \| Q2 2024 \| +\| Private Clusters for HDInsight on AKS \| Q2 2024 \| +\| Ranger Support for Spark SQL \| Q2 2024 \| +\| Ranger ACLs on Storage Layer \| Q2 2024 \| +\| Support for One lake as primary container \| Q2 2024 \| ++
healthcare-apis	Deploy Dicom Services In Azure	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/healthcare-apis/dicom/deploy-dicom-services-in-azure.md	Title: Deploy DICOM service using the Azure portal - Azure Health Data Services description: This article describes how to deploy DICOM service in the Azure portal. -- Previously updated : 05/03/2022+ Last updated : 10/06/2023 -# Deploy DICOM service using the Azure portal +# Deploy the DICOM service using Azure portal In this quickstart, you'll learn how to deploy DICOM Service using the Azure portal. -Once deployment is complete, you can use the Azure portal to navigate to the newly created DICOM service to see the details including your Service URL. The Service URL to access your DICOM service will be: ```https://<workspacename-dicomservicename>.dicom.azurehealthcareapis.com```. Make sure to specify the version as part of the url when making requests. More information can be found in the [API Versioning for DICOM service documentation](api-versioning-dicom-service.md). +Once deployment is complete, you can use the Azure portal to navigate to the newly created DICOM service to see the details including your service URL. The service URL to access your DICOM service will be: ```https://<workspacename-dicomservicename>.dicom.azurehealthcareapis.com```. Make sure to specify the version as part of the url when making requests. More information can be found in the [API Versioning for DICOM service documentation](api-versioning-dicom-service.md). ## Prerequisite
healthcare-apis	Dicom Get Access Token Azure Cli Old	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/healthcare-apis/dicom/dicom-get-access-token-azure-cli-old.md	- Title: Get access token using Azure CLI - Azure Health Data Services for DICOM service -description: This article explains how to obtain an access token for the DICOM service using the Azure CLI. ----- Previously updated : 03/02/2022--- -# Get access token for the DICOM service using Azure CLI - -In this article, you'll learn how to obtain an access token for the DICOM service using the Azure CLI. When you [deploy the DICOM service](deploy-dicom-services-in-azure.md), you configure a set of users or service principals that have access to the service. If your user object ID is in the list of allowed object IDs, you can access the service using a token obtained using the Azure CLI. - -## Prerequisites - -Use the Bash environment in Azure Cloud Shell. - -[ ![Screenshot of Launch Azure Cloud Shell.](media/launch-cloud-shell.png) ](media/launch-cloud-shell.png#lightbox) - -If you prefer, [install](/cli/azure/install-azure-cli) the Azure CLI to run CLI reference commands. - -* If you're using a local installation, sign in to the Azure CLI by using the [az login](/cli/azure/reference-index#az-login) command. To finish the authentication process, follow the steps displayed in your terminal. For additional sign-in options, see [Sign in with the Azure CLI](/cli/azure/authenticate-azure-cli). -* When you're prompted, install Azure CLI extensions on first use. For more information about extensions, see [Use extensions with the Azure CLI](/cli/azure/azure-cli-extensions-overview). -* Run [az version](/cli/azure/reference-index#az-version) to find the version and dependent libraries that are installed. To upgrade to the latest version, run [az upgrade](/cli/azure/reference-index#az-upgrade). - -## Obtain a token - -The DICOM service uses a `resource` or `Audience` with URI equal to the URI of the DICOM server `https://dicom.healthcareapis.azure.com`. You can obtain a token and store it in a variable (named `$token`) with the following command: -- -```Azure CLICopy -Try It -$token=$(az account get-access-token --resource=https://dicom.healthcareapis.azure.com --query accessToken --output tsv) -``` - -## Use with the DICOM service - -```Azure CLICopy -Try It -curl -X GET --header "Authorization: Bearer $token" https://<workspacename-dicomservicename>.dicom.azurehealthcareapis.com/v<version of REST API>/changefeed -``` - -## Next steps - -In this article, you've learned how to obtain an access token for the DICOM service using the Azure CLI. - ->[!div class="nextstepaction"] ->[Overview of the DICOM service](dicom-services-overview.md)
healthcare-apis	Get Access Token	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/healthcare-apis/dicom/get-access-token.md	+ + Title: Get an access token for the DICOM service in Azure Health Data Services +description: Find out how to secure your access to the DICOM service with a token. Use the Azure command-line tool and unique identifiers to manage your medical images. ++++ Last updated : 10/6/2023+++ +# Get an access token + +To use the DICOM service, users and applications need to prove their identity and permissions by getting an access token. An access token is a string that identifies a user or an application and grants them permission to access a resource. Using access tokens enhances security by preventing unauthorized access and reducing the need for repeated authentication. + +## Use the Azure command-line interface + + You get an access token using the [Azure command-line interface (CLI)](/cli/azure/what-is-azure-cli). Azure CLI is a set of commands used to create and manage Azure resources. You can use it to interact with Azure services, including the DICOM service. You can install Azure CLI on your computer or use it in the [Azure Cloud Shell](https://azure.microsoft.com/get-started/azure-portal/cloud-shell/). + +## Assign roles and grant permissions + +Before you get an access token, configure access control for the DICOM service using [Azure role-based access control (Azure RBAC)](../../role-based-access-control/overview.md). Azure RBAC is a system that allows you to define who can access what resources and what actions they can perform on them. + +To assign roles and grant access to the DICOM service: + +1. Register a client application in Microsoft Entra ID that acts as your identity provider and authentication mechanism. Use Azure portal, PowerShell, or Azure CLI to [register an application](dicom-register-application.md). + +1. Assign one of the built-in roles for the DICOM data plane to the client application. The roles are: +- DICOM Data Owner. Gives full access to DICOM data. +- DICOM Data Reader. Allows read and search operations on DICOM data. + +## Get a token + +To get an access token using Azure CLI: + +1. Sign in to Azure CLI with your user account or a service principal. + +1. Get the object ID of your user account or service principal by using the commands `az ad signed-in-user show` or `az ad sp list`, respectively. + +1. Get the access token by using the command `az account get-access-token --resource=https://dicom.healthcareapis.azure.com`. + +1. Copy the access token from the output of the command. + +1. Use the access token in your requests to the DICOM service by adding it as a header with the name `Authorization` and the value `Bearer <access token>`. + +### Store a token in a variable + +The DICOM service uses a `resource` or `Audience` with uniform resource identifier (URI) equal to the URI of the DICOM server `https://dicom.healthcareapis.azure.com`. You can obtain a token and store it in a variable (named `$token`) with the following command: + +```Azure CLICopy +Try It +$token=$(az account get-access-token --resource=https://dicom.healthcareapis.azure.com --query accessToken --output tsv) +``` + +### Tips for using a local installation of Azure CLI + +* If you're using a local installation, sign in to the Azure CLI with the [az login](/cli/azure/reference-index#az-login) command. To finish authentication, follow the on-screen steps. For more information, see [Sign in with the Azure CLI](/cli/azure/authenticate-azure-cli). + +* If prompted, install Azure CLI extensions on first use. For more information, see [Use extensions with the Azure CLI](/cli/azure/azure-cli-extensions-overview). + +* Run [az version](/cli/azure/reference-index#az-version) to find the version and dependent libraries that are installed. To upgrade to the latest version, run [az upgrade](/cli/azure/reference-index#az-upgrade). + +## Use a token with the DICOM service + +You can use a token with the DICOM service [using cURL](dicomweb-standard-apis-curl.md). Here's an example: + +```Azure CLICopy +Try It +curl -X GET --header "Authorization: Bearer $token" https://<workspacename-dicomservicename>.dicom.azurehealthcareapis.com/v<version of REST API>/changefeed +```
load-balancer	Load Balancer Basic Upgrade Guidance	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/load-balancer-basic-upgrade-guidance.md	We recommend the following approach for upgrading to Standard Load Balancer: 1. Learn about some of the [key differences](#basic-load-balancer-sku-vs-standard-load-balancer-sku) between Basic Load Balancer and Standard Load Balancer. 1. Identify the Basic Load Balancer to upgrade. 1. Create a migration plan for planned downtime. -1. Perform migration with [automated PowerShell scripts](#upgrade-using-automated-scripts) for your scenario or create a new Standard Load Balancer with the Basic Load Balancer configurations. +1. Perform migration with [automated PowerShell scripts](#upgrade-using-automated-scripts-recommended) for your scenario or create a new Standard Load Balancer with the Basic Load Balancer configurations. 1. Verify your application and workloads are receiving traffic through the Standard Load Balancer. Then delete your Basic Load Balancer resource. ## Basic Load Balancer SKU vs. standard Load Balancer SKU This section lists out some key differences between these two Load Balancer SKUs For information on limits, see [Load Balancer limits](../azure-resource-manager/management/azure-subscription-service-limits.md#load-balancer). -## Upgrade using automated scripts +## Upgrade using automated scripts (recommended) Use these PowerShell scripts to help with upgrading from Basic to Standard SKU: -- [Upgrading a basic to standard public load balancer](upgrade-basic-standard.md)-- [Upgrade from Basic Internal to Standard Internal](upgrade-basicInternal-standard.md)-- [Upgrade an internal basic load balancer - Outbound connections required](upgrade-internalbasic-to-publicstandard.md)-- [Upgrade a basic load balancer used with Virtual Machine Scale Sets](./upgrade-basic-standard-virtual-machine-scale-sets.md) +- [Upgrading a basic to standard public load balancer with PowerShell](./upgrade-basic-standard-with-powershell.md) + +## Upgrade manually + +> [!NOTE] +> Although manually upgrading your Basic Load Balancer to a Standard Load Balancer using the Portal is an option, we recommend using the [automated script option](./load-balancer-multiple-ip-powershell.md) above, due to the number of steps and complexity of the migration. The automation ensures a consistent migration and minimizes downtime to load balanced applications. + +When manually migrating from a Basic to Standard SKU Load Balancer, there are a couple key considerations to keep in mind: + +- It is not possible to mix Basic and Standard SKU IPs or Load Balancers. All Public IPs associated with a Load Balancer and its backend pool members must match. +- Public IP allocation method must be set to 'static' when a Public IP is disassociated from a Load Balancer or Virtual Machine, or the allocated IP will be lost. +- Standard SKU public IP addresses are secure by default, requiring that a Network Security Group explicitly allow traffic to any public IPs +- Standard SKU Load Balancers block outbound access by default. To enable outbound access, a public load balancer needs an outbound rule for backend members. For private load balancers, either configure a NAT Gateway on the backend pool members' subnet or add instance-level public IP addresses to each backend member. + +Suggested order of operations for manually upgrading a Basic Load Balancer in common virtual machine and virtual machine scale set configurations using the Portal: + +1. Change all Public IPs associated with the Basic Load Balancer and backend Virtual Machines to 'static' allocation +1. For private Load Balancers, record the private IP addresses allocated to the frontend IP configurations +1. Record the backend pool membership of the Basic Load Balancer +1. Record the load balancing rules, NAT rules and health probe configuration of the Basic Load Balancer +1. Create a new Standard SKU Load Balancer, matching the public or private configuration of the Basic Load Balancer. Name the frontend IP configuration something temporary. For public load balancers, use a new Public IP address for the frontend configuration. For guidance, see [Create a Public Load Balancer in the Portal](./quickstart-load-balancer-standard-public-portal.md) or [Create an Internal Load Balancer in the Portal](./quickstart-load-balancer-standard-internal-portal.md) +1. Duplicate the Basic SKU Load Balancer configuration for the following: + 1. Backend pool names + 1. Backend pool membership (virtual machines and virtual machine scale sets) + 1. Health probes + 1. Load balancing rules - use the temporary frontend configuration + 1. NAT rules - use the temporary frontend configuration +1. For public load balancers, if you do not have one already, [create a new Network Security Group](../virtual-network/tutorial-filter-network-traffic.md) with allow rules for the traffic coming through the Load Balancer rules +1. For Virtual Machine Scale Set backends, remove the Load Balancer association in the Networking settings and [update the instances](../virtual-machine-scale-sets/virtual-machine-scale-sets-upgrade-policy.md#performing-manual-upgrades) +1. Delete the Basic Load Balancer + > [!NOTE] + > For Virtual Machine Scale Set backends, you will need to remove the load balancer association in the Networking settings and update the instances prior to deletion of the Basic Load Balancer. Once removed, you will also need to [update the instances](../virtual-machine-scale-sets/virtual-machine-scale-sets-upgrade-policy.md#performing-manual-upgrades) +1. [Upgrade all Public IPs](../virtual-network/ip-services/public-ip-upgrade-portal.md) previously associated with the Basic Load Balancer and backend Virtual Machines to Standard SKU. For Virtual Machine Scale Sets, remove any instance-level public IP configuration, update the instances, then add a new one with Standard SKU and update the instances again. +1. Recreate the frontend configurations from the Basic Load Balancer on the newly created Standard Load Balancer, using the same public or private IP addresses as on the Basic Load Balancer +1. Update the load balancing and NAT rules to use the appropriate frontend configurations +1. For public Load Balancers, [create one or more outbound rules](./outbound-rules.md) to enable internet access for backend pools +1. Remove the temporary frontend configuration +1. Test that inbound and outbound traffic flow through the new Standard Load Balancer as expected ## Next Steps
load-balancer	Load Balancer Nat Pool Migration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/load-balancer-nat-pool-migration.md	The migration process will create a new Backend Pool for each Inbound NAT Pool e ### Prerequisites -* In order to migrate a Load Balancer's NAT Pools to NAT Rules, the Load Balancer SKU must be 'Standard'. To automate this upgrade process, see the steps provided in [Upgrade a basic load balancer used with Virtual Machine Scale Sets](upgrade-basic-standard-virtual-machine-scale-sets.md). +* In order to migrate a Load Balancer's NAT Pools to NAT Rules, the Load Balancer SKU must be 'Standard'. To automate this upgrade process, see the steps provided in [Upgrade a Basic Load Balancer to Standard with PowerShell](upgrade-basic-standard-with-powershell.md). * Virtual Machine Scale Sets associated with the target Load Balancer must use either a 'Manual' or 'Automatic' upgrade policy--'Rolling' upgrade policy is not supported. For more information, see [Virtual Machine Scale Sets Upgrade Policies](../virtual-machine-scale-sets/virtual-machine-scale-sets-upgrade-policy.md) * Install the latest version of [PowerShell](/powershell/scripting/install/installing-powershell) * Install the [Azure PowerShell modules](/powershell/azure/install-azure-powershell)
load-balancer	Upgrade Basic Standard With Powershell	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/upgrade-basic-standard-with-powershell.md	+ + Title: Upgrade from Basic to Standard with PowerShell + +description: This article shows you how to upgrade a load balancer from basic to standard SKU for Virtual Machine or Virtual Machine Scale Sets backends using a PowerShell module. ++++ Last updated : 10/03/2023++++ +# Upgrade a basic load balancer with PowerShell + +>[!Important] +>On September 30, 2025, Basic Load Balancer will be retired. For more information, see the [official announcement](https://azure.microsoft.com/updates/azure-basic-load-balancer-will-be-retired-on-30-september-2025-upgrade-to-standard-load-balancer/). If you are currently using Basic Load Balancer, make sure to upgrade to Standard Load Balancer prior to the retirement date. + +[Azure Standard Load Balancer](load-balancer-overview.md) offers a rich set of functionality and high availability through zone redundancy. To learn more about Load Balancer SKU, see [comparison table](./skus.md#skus). + +This article introduces a PowerShell module that creates a Standard Load Balancer with the same configuration as the Basic Load Balancer, then associates the Virtual Machine Scale Set or Virtual Machine backend pool members with the new Load Balancer. + +## Upgrade Overview + +The PowerShell module performs the following functions: + +- Verifies that the provided Basic Load Balancer scenario is supported for upgrade. +- Backs up the Basic Load Balancer and Virtual Machine Scale Set configuration, enabling retry on failure or if errors are encountered. +- For public load balancers, updates the front end public IP address(es) to Standard SKU and static assignment as required. +- Upgrades the Basic Load Balancer configuration to a new Standard Load Balancer, ensuring configuration and feature parity. +- Migrates Virtual Machine Scale Set and Virtual Machine backend pool members from the Basic Load Balancer to the Standard Load Balancer. +- Creates and associates a network security group with the Virtual Machine Scale Set or Virtual Machine to ensure load balanced traffic reaches backend pool members, following Standard Load Balancer's move to a default-deny network policy. +- Upgrades instance-level Public IP addresses associated with Virtual Machine Scale Set or Virtual Machine instances +- Logs the upgrade operation for easy audit and failure recovery. + +>[!WARNING] +> Migrating _internal_ Basic Load Balancers where the backend VMs or VMSS instances do not have Public IP Addresses assigned requires additional action post-migration to enable backend pool members to connect to the internet. The recommended approach is to create a NAT Gateway and assign it to the backend pool members' subnet (see: [Integrate NAT Gateway with Internal Load Balancer](../virtual-network/nat-gateway/tutorial-nat-gateway-load-balancer-internal-portal.md)). Alternatively, Public IP Addresses can be allocated to each Virtual Machine Scale Set or Virtual Machine instance by adding a Public IP Configuration to the Network Profile (see: [VMSS Public IPv4 Address Per Virtual Machine](../virtual-machine-scale-sets/virtual-machine-scale-sets-networking.md)) for Virtual Machine Scale Sets or [Associate a Public IP address with a Virtual Machine](../virtual-network/ip-services/associate-public-ip-address-vm.md) for Virtual Machines. + +>[!NOTE] +> If the Virtual Machine Scale Set in the Load Balancer backend pool has Public IP Addresses in its network configuration, the Public IP Addresses will change during migration when they are upgraded to Standard SKU. The Public IP addresses associated with Virtual Machines will be retained through the migration. + +>[!NOTE] +> If the Virtual Machine Scale Set behind the Load Balancer is a Service Fabric Cluster, migration with this script will take more time. In testing, a 5-node Bronze cluster was unavailable for about 30 minutes and a 5-node Silver cluster was unavailable for about 45 minutes. For Service Fabric clusters that require minimal / no connectivity downtime, adding a new nodetype with Standard Load Balancer and IP resources is a better solution. + +### Unsupported Scenarios + +- Basic Load Balancers with IPV6 frontend IP configurations +- Basic Load Balancers with a Virtual Machine Scale Set backend pool member where one or more Virtual Machine Scale Set instances have ProtectFromScaleSetActions Instance Protection policies enabled +- Migrating a Basic Load Balancer to an existing Standard Load Balancer + +### Prerequisites + +- PowerShell: A supported version of PowerShell version 7 or higher is the recommended version of PowerShell for use with the AzureBasicLoadBalancerUpgrade module on all platforms including Windows, Linux, and macOS. However, Windows PowerShell 5.1 is supported. +- Az PowerShell Module: Determine whether you have the latest Az PowerShell module installed + - Install the latest [Az PowerShell module](/powershell/azure/install-azure-powershell) +- Az.ResourceGraph PowerShell Module: The Az.ResourceGraph PowerShell module is used to query resource configuration during upgrade and is a separate install from the Az PowerShell module. It will be automatically installed if you install the `AzureBasicLoadBalancerUpgrade` module using the `Install-Module` command as shown below. + +## Install the 'AzureBasicLoadBalancerUpgrade' module + +Install the module from [PowerShell gallery](https://www.powershellgallery.com/packages/AzureBasicLoadBalancerUpgrade) + +```powershell +PS C:\> Install-Module -Name AzureBasicLoadBalancerUpgrade -Scope CurrentUser -Repository PSGallery -Force +``` + +## Use the module + +1. Use `Connect-AzAccount` to connect to the required Azure AD tenant and Azure subscription + + ```powershell + PS C:\> Connect-AzAccount -Tenant <TenantId> -Subscription <SubscriptionId> + ``` + +2. Find the Load Balancer you wish to upgrade. Record its name and resource group name. + +3. Examine the basic module parameters: + - BasicLoadBalancerName [string] Required - This parameter is the name of the existing Basic Load Balancer you would like to upgrade + - ResourceGroupName [string] Required - This parameter is the name of the resource group containing the Basic Load Balancer + - StandardLoadBalancerName [string] Optional - Use this parameter to optionally configure a new name for the Standard Load Balancer. If not specified, the Basic Load Balancer name is reused. + - RecoveryBackupPath [string] Optional - This parameter allows you to specify an alternative path in which to store the Basic Load Balancer ARM template backup file (defaults to the current working directory) + + >[!NOTE] + >Additional parameters for advanced and recovery scenarios can be viewed by running `Get-Help Start-AzBasicLoadBalancerUpgrade -Detailed` + +4. Run the Upgrade command. + +### Example: upgrade a Basic Load Balancer to a Standard Load Balancer with the same name, providing the Basic Load Balancer name and resource group + +```powershell +PS C:\> Start-AzBasicLoadBalancerUpgrade -ResourceGroupName <load balancer resource group name> -BasicLoadBalancerName <existing Basic Load Balancer name> +``` + +### Example: upgrade a Basic Load Balancer to a Standard Load Balancer with the specified name, displaying logged output on screen + +```powershell +PS C:\> Start-AzBasicLoadBalancerUpgrade -ResourceGroupName <load balancer resource group name> -BasicLoadBalancerName <existing Basic Load Balancer name> -StandardLoadBalancerName <new Standard Load Balancer name> -FollowLog +``` + +### Example: upgrade a Basic Load Balancer to a Standard Load Balancer with the specified name and store the Basic Load Balancer backup file at the specified path + +```powershell +PS C:\> Start-AzBasicLoadBalancerUpgrade -ResourceGroupName <load balancer resource group name> -BasicLoadBalancerName <existing Basic Load Balancer name> -StandardLoadBalancerName <new Standard Load Balancer name> -RecoveryBackupPath C:\BasicLBRecovery +``` + +### Example: validate a completed migration by passing the Basic Load Balancer state file backup and the Standard Load Balancer name + +```powershell +PS C:\> Start-AzBasicLoadBalancerUpgrade -validateCompletedMigration -basicLoadBalancerStatePath C:\RecoveryBackups\State_mybasiclb_rg-basiclbrg_20220912T1740032148.json +``` + +### Example: migrate multiple Load Balancers with shared backend members at the same time + +```powershell +# build array of multiple basic load balancers +PS C:\> $multiLBConfig = @( + @{ + 'standardLoadBalancerName' = 'myStandardLB01' + 'basicLoadBalancer' = (Get-AzLoadBalancer -ResourceGroupName myRG -Name myBasicInternalLB01) + }, + @{ + 'standardLoadBalancerName' = 'myStandardLB02' + 'basicLoadBalancer' = (Get-AzLoadBalancer -ResourceGroupName myRG -Name myBasicExternalLB02) + } +) +PS C:\> Start-AzBasicLoadBalancerUpgrade -MultiLBConfig $multiLBConfig +``` + +### Example: retry a failed upgrade for a virtual machine scale set's load balancer (due to error or script termination) by providing the Basic Load Balancer and Virtual Machine Scale Set backup state file + +```powershell +PS C:\> Start-AzBasicLoadBalancerUpgrade -FailedMigrationRetryFilePathLB C:\RecoveryBackups\State_mybasiclb_rg-basiclbrg_20220912T1740032148.json -FailedMigrationRetryFilePathVMSS C:\RecoveryBackups\VMSS_myVMSS_rg-basiclbrg_20220912T1740032148.json +``` + +### Example: retry a failed upgrade for a VM load balancer (due to error or script termination) by providing the Basic Load Balancer backup state file + +```powershell +PS C:\> Start-AzBasicLoadBalancerUpgrade -FailedMigrationRetryFilePathLB C:\RecoveryBackups\State_mybasiclb_rg-basiclbrg_20220912T1740032148.json +``` + +## Common Questions + +### Will this migration cause downtime to my application? + +Yes, because the Basic Load Balancer needs to be removed before the new Standard Load Balancer can be created, there will be downtime to your application. See [How long does the Upgrade take?](#how-long-does-the-upgrade-take) + +### Will the module migrate my frontend IP address to the new Standard Load Balancer? + +Yes, for both public and internal load balancers, the module ensures that front end IP addresses are maintained. For public IPs, the IP is converted to a static IP prior to migration (if necessary). For internal front ends, the module attempts to reassign the same IP address freed up when the Basic Load Balancer was deleted; if the private IP isn't available the script fails (see [What happens if my upgrade fails mid-migration?](#what-happens-if-my-upgrade-fails-mid-migration)). + +### How long does the Upgrade take? + +The upgrade normally takes a few minutes for the script to finish. The following factors may lead to longer upgrade times: +- Complexity of your load balancer configuration +- Number of backend pool members +- Instance count of associated Virtual Machine Scale Sets or Virtual Machines +- Service Fabric Cluster: Upgrades for Service Fabric Clusters take around an hour in testing + +Keep the downtime in mind and plan for failover if necessary. + +### Does the script migrate my backend pool members from my Basic Load Balancer to the newly created Standard Load Balancer? + +Yes. The Azure PowerShell script migrates the Virtual Machine Scale Sets and Virtual Machines to the newly created Standard Load Balancer backend pools. + +### Which load balancer components are migrated? + +The script migrates the following from the Basic Load Balancer to the Standard Load Balancer: + +Public and Private Load Balancers: + +- Health Probes: + - All probes are migrated to the new Standard Load Balancer +- Load balancing rules: + - All load balancing rules are migrated to the new Standard Load Balancer +- Inbound NAT Rules: + - All user-created NAT rules are migrated to the new Standard Load Balancer +- Inbound NAT Pools: + - All inbound NAT Pools will be migrated to the new Standard Load Balancer +- Backend pools: + - All backend pools are migrated to the new Standard Load Balancer + - All Virtual Machine Scale Set and Virtual Machine network interfaces and IP configurations are migrated to the new Standard Load Balancer + - If a Virtual Machine Scale Set is using Rolling Upgrade policy, the script will update the Virtual Machine Scale Set upgrade policy to "Manual" during the migration process and revert it back to "Rolling" after the migration is completed. +- Instance-level Public IP addresses + - For both Virtual Machines and Virtual Machine Scale Sets, converts attached Public IPs from Basic to Standard SKU. Note, Scale Set instance Public IPs will change during the upgrade; virtual machine IPs will not. +- Tags from the Basic Load Balancer to Standard Load Balancer + +Public Load Balancer: + +- Public frontend IP configuration + - Converts the public IP to a static IP, if dynamic + - Updates the public IP SKU to Standard, if Basic + - Upgrade all associated public IPs to the new Standard Load Balancer +- Outbound Rules: + - Basic load balancers don't support configured outbound rules. The script creates an outbound rule in the Standard load balancer to preserve the outbound behavior of the Basic load balancer. For more information about outbound rules, see [Outbound rules](./outbound-rules.md). +- Network security group + - Basic Load Balancer doesn't require a network security group to allow outbound connectivity. In case there's no network security group associated with the Virtual Machine Scale Set, a new network security group is created to preserve the same functionality. This new network security group is associated to the Virtual Machine Scale Set backend pool member network interfaces. It allows the same load balancing rules ports and protocols and preserve the outbound connectivity. ++ +Internal Load Balancer: + +- Private frontend IP configuration + +>[!NOTE] +> Network security group are not configured as part of Internal Load Balancer upgrade. To learn more about NSGs, see [Network security groups](../virtual-network/network-security-groups-overview.md) + +### How do I migrate when my backend pool members belong to multiple Load Balancers? + +In a scenario where your backend pool members are also members of backend pools on another Load Balancer, such as when you have internal and external Load Balancers for the same application, the Basic Load Balancers need to be migrated at the same time. Trying to migrate the Load Balancers one at a time would attempt to mix Basic and Standard SKU resources, which is not allowed. The migration script supports this by passing multiple Basic Load Balancers into the same [script execution using the `-MultiLBConfig` parameter](#example-migrate-multiple-load-balancers-with-shared-backend-members-at-the-same-time). + +### How do I validate that a migration was successful? + +At the end of its execution, the upgrade module performs the following validations, comparing the Basic Load Balancer to the new Standard Load Balancer. In a failed migration, this same operation can be called using the `-validateCompletedMigration` and `-basicLoadBalancerStatePath` parameters to determine the configuration state of the Standard Load Balancer (if one was created). The log file created during the migration also provides extensive detail on the migration operation and any errors. + +- The Standard Load Balancer exists and its SKU is 'Standard' +- The count of front end IP configurations match and that the IP addresses are the same +- The count of backend pools and their memberships matches +- The count of load balancing rules matches +- The count of health probes matches +- The count of inbound NAT rules matches +- The count of inbound NAT pools matches +- External Standard Load Balancers have a configured outbound rule +- External Standard Load Balancer backend pool members have associated Network Security Groups + +### What happens if my upgrade fails mid-migration? + +The module is designed to accommodate failures, either due to unhandled errors or unexpected script termination. The failure design is a 'fail forward' approach, where instead of attempting to move back to the Basic Load Balancer, you should correct the issue causing the failure (see the error output or log file), and retry the migration again, specifying the `-FailedMigrationRetryFilePathLB <BasicLoadBalancerbackupFilePath> -FailedMigrationRetryFilePathVMSS <VMSSBackupFile>` parameters. For public load balancers, because the Public IP Address SKU has been updated to Standard, moving the same IP back to a Basic Load Balancer won't be possible. + +If your failed migration was targeting multiple load balancers at the same time, using the `-MultiLBConfig` parameter, recover each Load Balancer individually using the same process as below. + +The basic failure recovery procedure is: + + 1. Address the cause of the migration failure. Check the log file `Start-AzBasicLoadBalancerUpgrade.log` for details + 1. [Remove the new Standard Load Balancer](./update-load-balancer-with-vm-scale-set.md) (if created). Depending on which stage of the migration failed, you may have to remove the Standard Load Balancer reference from the Virtual Machine Scale Set or Virtual Machine network interfaces (IP configurations) and Health Probes in order to remove the Standard Load Balancer. + 1. Locate the Basic Load Balancer state backup file. This file will either be in the directory where the script was executed, or at the path specified with the `-RecoveryBackupPath` parameter during the failed execution. The file is named: `State_<basicLBName>_<basicLBRGName>_<timestamp>.json` + 1. Rerun the migration script, specifying the `-FailedMigrationRetryFilePathLB <BasicLoadBalancerbackupFilePath>` and `-FailedMigrationRetryFilePathVMSS <VMSSBackupFile>` (for Virtual Machine Scaleset backends) parameters instead of -BasicLoadBalancerName or passing the Basic Load Balancer over the pipeline + +## Next steps + +[Learn about Azure Load Balancer](load-balancer-overview.md)
load-balancer	Upgrade Basic Standard	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/upgrade-basic-standard.md	An Azure PowerShell script is available that does the following procedures: * If the load balancer doesn't have a frontend IP configuration or backend pool, you'll encounter an error running the script. Ensure the load balancer has a frontend IP and backend pool -* The script can't migrate Virtual Machine Scale Set from Basic Load Balancer's backend to Standard Load Balancer's backend. For this type of upgrade, see [Upgrade a basic load balancer used with Virtual Machine Scale Sets](./upgrade-basic-standard-virtual-machine-scale-sets.md) for instructions and more information. +* The script can't migrate Virtual Machine Scale Set from Basic Load Balancer's backend to Standard Load Balancer's backend. For this type of upgrade, see [Upgrade a basic load balancer used with Virtual Machine Scale Sets](./upgrade-basic-standard-with-powershell.md) for instructions and more information. ### Change allocation method of the public IP address to static
load-balancer	Upgrade Basicinternal Standard	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/upgrade-basicInternal-standard.md	This article introduces a PowerShell script that creates a Standard Load Balance * The Basic Load Balancer needs to be in the same resource group as the backend VMs and NICs. * If the Standard load balancer is created in a different region, you won't be able to associate the VMs existing in the old region to the newly created Standard Load Balancer. To work around this limitation, make sure to create a new VM in the new region. * If your Load Balancer doesn't have any frontend IP configuration or backend pool, you're likely to hit an error running the script. Make sure they aren't empty. -* The script can't migrate Virtual Machine Scale Set from Basic Load Balancer's backend to Standard Load Balancer's backend. For this type of upgrade, see [Upgrade a basic load balancer used with Virtual Machine Scale Sets](./upgrade-basic-standard-virtual-machine-scale-sets.md) for instructions and more information. +* The script can't migrate Virtual Machine Scale Set from Basic Load Balancer's backend to Standard Load Balancer's backend. For this type of upgrade, see [Upgrade a basic load balancer used with Virtual Machine Scale Sets](./upgrade-basic-standard-with-powershell.md) for instructions and more information. ## Change IP allocation method to Static for frontend IP Configuration (Ignore this step if it's already static)
load-balancer	Upgrade Internalbasic To Publicstandard	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/upgrade-internalbasic-to-publicstandard.md	An Azure PowerShell script is available that does the following procedures: * If the load balancer doesn't have a frontend IP configuration or backend pool, you'll encounter an error running the script. Ensure the load balancer has a frontend IP and backend pool -* The script can't migrate Virtual Machine Scale Set from Basic Load Balancer's backend to Standard Load Balancer's backend. For this type of upgrade, see [Upgrade a basic load balancer used with Virtual Machine Scale Sets](./upgrade-basic-standard-virtual-machine-scale-sets.md) for instructions and more information. +* The script can't migrate Virtual Machine Scale Set from Basic Load Balancer's backend to Standard Load Balancer's backend. For this type of upgrade, see [Upgrade a basic load balancer used with Virtual Machine Scale Sets](./upgrade-basic-standard-with-powershell.md) for instructions and more information. ## Download the script
machine-learning	How To Configure Private Link	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-configure-private-link.md	You can also enable public network access by using a YAML file. For more informa 1. From the [Azure portal](https://portal.azure.com), select your Azure Machine Learning workspace. 1. From the left side of the page, select __Networking__ and then select the __Public access__ tab. -1. Select __All networks__, and then select __Save__. +1. Select __Enabled from all networks__, and then select __Save__. :::image type="content" source="./media/how-to-configure-private-link/workspace-public-access.png" alt-text="Screenshot of the UI to enable public endpoint."::: +## Enable Public Access only from internet IP ranges (preview) + +You can use IP network rules to allow access to your workspace and endpoint from specific public internet IP address ranges by creating IP network rules. Each Azure Machine Learning workspace supports up to 200 rules. These rules grant access to specific internet-based services and on-premises networks and block general internet traffic. + +> [!WARNING] +> * Enable your endpoint's [public network access flag](concept-secure-online-endpoint.md#secure-inbound-scoring-requests) if you want to allow access to your endpoint from specific public internet IP address ranges. +> * When you enable this feature, this has an impact to all existing public endpoints associated with your workspace. This may limit access to new or existing endpoints. If you access any endpoints from a non-allowed IP, you get a 403 error. + +# [Azure CLI](#tab/cli) +Azure CLI does not support this. + +# [Portal](#tab/azure-portal) + +1. From the [Azure portal](https://portal.azure.com), select your Azure Machine Learning workspace. +1. From the left side of the page, select __Networking__ and then select the __Public access__ tab. +1. Select __Enabled from selected IP addresses__, input address ranges and then select __Save__. ++++ +### Restrictions for IP network rules + +The following restrictions apply to IP address ranges: + +- IP network rules are allowed only for public internet IP addresses. + + [Reserved IP address ranges](https://en.wikipedia.org/wiki/Reserved_IP_addresses) aren't allowed in IP rules such as private addresses that start with 10, 172.16 to 172.31, and 192.168. + +- You must provide allowed internet address ranges by using [CIDR notation](https://tools.ietf.org/html/rfc4632) in the form 16.17.18.0/24 or as individual IP addresses like 16.17.18.19. + +- Only IPv4 addresses are supported for configuration of storage firewall rules. + +- When this feature is enabled, you can test public endpoints using any client tool such as Postman or others, but the Endpoint Test tool in the portal is not supported. + ## Securely connect to your workspace [!INCLUDE [machine-learning-connect-secure-workspace](includes/machine-learning-connect-secure-workspace.md)]
machine-learning	How To Identity Based Service Authentication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-identity-based-service-authentication.md	The following [Azure RBAC role assignments](../role-based-access-control/role-as \|Resource\|Permission\| \|\|\| +\|Azure Machine Learning workspace\|Contributor\| \|Azure Storage\|Contributor (control plane) + Storage Blob Data Contributor (data plane, optional, to enable data preview in the Azure Machine Learning studio)\| \|Azure Key Vault (when using [RBAC permission model](../key-vault/general/rbac-guide.md))\|Contributor (control plane) + Key Vault Administrator (data plane)\| \|Azure Key Vault (when using [access policies permission model](../key-vault/general/assign-access-policy.md))\|Contributor + any access policy permissions besides purge operations\|
machine-learning	How To Deploy For Real Time Inference	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/prompt-flow/how-to-deploy-for-real-time-inference.md	Select Metrics tab in the left navigation. Select promptflow standard metr ## Troubleshoot endpoints deployed from prompt flow +### Model response taking too long + +Sometimes you may notice that the deployment is taking too long to respond. There are several potential factors for this to occur. + +- Model is not powerful enough (ex. use gpt over text-ada) +- Index query is not optimized and taking too long +- Flow has many steps to process + +Consider optimizing the endpoint with above considerations to improve the performance of the model. + ### Unable to fetch deployment schema After you deploy the endpoint and want to test it in the Test tab in the endpoint detail page, if the Test tab shows Unable to fetch deployment schema** like following, you can try the following 2 methods to mitigate this issue:
managed-instance-apache-cassandra	Management Operations	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/managed-instance-apache-cassandra/management-operations.md	Our support benefits include: ## Backup and restore -Snapshot backups are enabled by default and taken every 24 hours. Backups are stored in an internal Azure Blob Storage account and are retained for up to 2 days (48 hours). There's no cost for the initial 2 backups. Additional backups will be charged, see [pricing](https://azure.microsoft.com/pricing/details/managed-instance-apache-cassandra/). To change the backup interval or retention period, or to restore from an existing backup, file a [support request](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest) in the Azure portal. +Snapshot backups are enabled by default and taken every 24 hours. Backups are stored in an internal Azure Blob Storage account and are retained for up to 2 days (48 hours). There's no cost for the initial 2 backups. Additional backups are charged, see [pricing](https://azure.microsoft.com/pricing/details/managed-instance-apache-cassandra/). To change the backup interval or retention period, you can edit the policy in the portal: ++ +To restore from an existing backup, file a [support request](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest) in the Azure portal. When filing the support case, you need to: + +1. Provide the backup id from portal for the backup you want to restore. This can be found in the portal: + + :::image type="content" source="./media/management-operations/backup.png" alt-text="Screenshot of backup schedule configuration page highlighting backup id." lightbox="./media/management-operations/backup.png" border="true"::: + +1. If restore of the whole cluster is not required, provide the keyspace and table (if applicable) that needs to be restored. +1. Advise whether you want the backup to be restored in the existing cluster, or in a new cluster. +1. If you want to restore to a new cluster, you need to create the new cluster first. Ensure that the target cluster matches the source cluster in terms of the number of data centers, and that corresponding data center has the same number of nodes. You can also decide whether to keep the credentials (username/password) in the new target cluster, or allow restore to override username/password with what was originally created. +1. You can also decide whether to keep `system_auth` keyspace in the new target cluster or allow the restore to overwrite it with data from the backup. The `system_auth` keyspace in Cassandra contains authorization and internal authentication data, including roles, role permissions, and passwords. Note that our default restore process will overwrite the `system_auth` keyspace. > [!NOTE] > The time it takes to respond to a request to restore from backup will depend both on the severity of support case you raise, and the amount of data to be restored. For example, if you raise a Sev-A support case, the SLA for response to the ticket is 15 minutes. However, we do not provide an SLA for time to complete the restore, as this is very dependent on the volume of data being restored. > [!WARNING] -> Backups can be restored to the same VNet/subnet as your existing cluster, but they cannot be restored to the same cluster. Backups can only be restored to new clusters. Backups are intended for accidental deletion scenarios, and are not geo-redundant. They are therefore not recommended for use as a disaster recovery (DR) strategy in case of a total regional outage. To safeguard against region-wide outages, we recommend a multi-region deployment. Take a look at our [quickstart for multi-region deployments](create-multi-region-cluster.md). +> Backups are intended for accidental deletion scenarios, and are not geo-redundant. They are therefore not recommended for use as a disaster recovery (DR) strategy in case of a total regional outage. To safeguard against region-wide outages, we recommend a multi-region deployment. Take a look at our [quickstart for multi-region deployments](create-multi-region-cluster.md). ## Security For more information on security features, see our article [here](security.md). ## Hybrid support -When a [hybrid](configure-hybrid-cluster.md) cluster is configured, automated reaper operations running in the service will benefit the whole cluster. This includes data centers that aren't provisioned by the service. Outside this, it is your responsibility to maintain your on-premises or externally hosted data center. +When a [hybrid](configure-hybrid-cluster.md) cluster is configured, automated reaper operations running in the service benefits the whole cluster. This includes data centers that aren't provisioned by the service. Outside this, it is your responsibility to maintain your on-premises or externally hosted data center. ## Next steps
managed-instance-apache-cassandra	Resilient Applications	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/managed-instance-apache-cassandra/resilient-applications.md	keywords: azure high availability disaster recovery cassandra resiliency Azure Managed Instance for Apache Cassandra provides automated deployment and scaling operations for managed open-source Apache Cassandra datacenters. Apache Cassandra is a great choice for building highly resilient applications due to it's distributed nature and masterless architecture ΓÇô any node in the database can provide the exact same functionality as any other node ΓÇô contributing to CassandraΓÇÖs robustness and resilience. This article provides tips on how to optimize high availability and how to approach disaster recover. + +## RPO and RTO + +RPO (recovery point objective) and RTO (recovery time objective), will both typically be very low (close to zero) for Apache Cassandra as long as you have: + +- A [multi-region deployment](create-multi-region-cluster.md) with cross region replication, and a [replication factor](https://cassandra.apache.org/doc/latest/cassandra/architecture/dynamo.html#replication-strategy) of 3. +- Enabled availability zones (select option when creating a cluster in the [portal](create-cluster-portal.md) or via [Azure CLI](create-cluster-cli.md)). +- Configured application-level failover using load balancing policy in the [client driver](https://cassandra.apache.org/doc/latest/cassandra/getting_started/drivers.html) and/or load balancing-level failover using traffic manager/Azure front door. + +RTO ("how long you are down in an outage") will be low because the cluster will be resilient across both zones and regions, and because Apache Cassandra itself is a highly fault tolerant, masterless system (all nodes can write) by default. RPO ("how much data can you lose in an outage") will be low because data will be sychronised between all nodes and data centers, so data loss in an outage would be minimal. + + > [!NOTE] + > It is not theoretically possible to achieve both RTO=0 and RPO=0 per [Cap Theorem](https://en.wikipedia.org/wiki/CAP_theorem). You will need to evaluate the trade off between consistency and availability/optimal performance - this will look different for each application. For example, if your application is read heavy, it might be better to cope with increased latency of cross-region writes to avoid data loss (favoring consistency). If the appplication is write heavy, and on a tight latency budget, the risk of losing some of the most recent writes in a major regional outage might be acceptable (favoring availability). ++ ## Availability zones Cassandra's masterless architecture brings fault tolerance from the ground up, and Azure Managed Instance for Apache Cassandra provides support for [availability zones](../availability-zones/az-overview.md#azure-regions-with-availability-zones) in selected regions to enhance resiliency at the infrastructure level. Given a replication factor of 3, availability zone support ensures that each replica is in a different availability zone, thus preventing a zonal outage from impacting your database/application. We recommend enabling availability zones where possible. + ## Multi-region redundancy Cassandra's architecture, coupled with Azure availability zones support, gives you some level of fault tolerance and resiliency. However, it's important to consider the impact of regional outages for your applications. We highly recommend deploying [multi region clusters](create-multi-region-cluster.md) to safeguard against region level outages. Although they are rare, the potential impact is severe.
network-watcher	Required Rbac Permissions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/network-watcher/required-rbac-permissions.md	Title: Azure RBAC permissions required to use Azure Network Watcher capabilities -description: Learn which Azure role-based access control (Azure RBAC) permissions are required to use Azure Network Watcher capabilities. + Title: Azure RBAC permissions + +description: Learn about the required Azure role-based access control (Azure RBAC) permissions to have in order to use each of the Azure Network Watcher capabilities. - Previously updated : 08/18/2023+ Last updated : 10/09/2023+ +#CustomerIntent: As an Azure administrator, I want to know the required Azure role-based access control (Azure RBAC) permissions to use each of the Network Watcher capabilities, so I can assign them correctly to users using any of those capabilities. # Azure role-based access control permissions required to use Network Watcher capabilities -Azure role-based access control (Azure RBAC) enables you to assign only the specific actions to members of your organization that they require to complete their assigned responsibilities. To use Azure Network Watcher capabilities, the account you log into Azure with, must be assigned to the [Owner](../role-based-access-control/built-in-roles.md?toc=%2fazure%2fnetwork-watcher%2ftoc.json#owner), [Contributor](../role-based-access-control/built-in-roles.md?toc=%2fazure%2fnetwork-watcher%2ftoc.json#contributor), or [Network contributor](../role-based-access-control/built-in-roles.md?toc=%2fazure%2fnetwork-watcher%2ftoc.json#network-contributor) built-in roles, or assigned to a [custom role](../role-based-access-control/custom-roles.md?toc=%2fazure%2fnetwork-watcher%2ftoc.json) that is assigned the actions listed for each Network Watcher capability in the sections that follow. To learn more about Network Watcher's capabilities, see [What is Network Watcher?](network-watcher-monitoring-overview.md) +Azure role-based access control (Azure RBAC) enables you to assign only the specific actions to members of your organization that they require to complete their assigned responsibilities. To use Azure Network Watcher capabilities, the account you log into Azure with, must be assigned to the [Owner](../role-based-access-control/built-in-roles.md?toc=/azure/network-watcher/toc.json#owner), [Contributor](../role-based-access-control/built-in-roles.md?toc=/azure/network-watcher/toc.json#contributor), or [Network contributor](../role-based-access-control/built-in-roles.md?toc=/azure/network-watcher/toc.json#network-contributor) built-in roles, or assigned to a [custom role](../role-based-access-control/custom-roles.md?toc=/azure/network-watcher/toc.json) that is assigned the actions listed for each Network Watcher capability in the sections that follow. To learn how to check roles assigned to a user for a subscription, see [List Azure role assignments using the Azure portal](../role-based-access-control/role-assignments-list-portal.md?toc=/azure/network-watcher/toc.json). If you can't see the role assignments, contact the respective subscription admin. To learn more about Network Watcher's capabilities, see [What is Network Watcher?](network-watcher-monitoring-overview.md) > [!IMPORTANT] -> [Network contributor](../role-based-access-control/built-in-roles.md?toc=%2fazure%2fnetwork-watcher%2ftoc.json#network-contributor) does not cover Microsoft.Storage/* or Microsoft.Compute/* actions listed in [Additional actions](#additional-actions) section. +> [Network contributor](../role-based-access-control/built-in-roles.md?toc=/azure/network-watcher/toc.json#network-contributor) does not cover the following actions: +> - Microsoft.Storage/* actions listed in [Additional actions](#additional-actions) or [Flow logs](#flow-logs) section. +> - Microsoft.Compute/* actions listed in [Additional actions](#additional-actions) section. +> - Microsoft.OperationalInsights/workspaces/\, Microsoft.Insights/dataCollectionRules/ or Microsoft.Insights/dataCollectionEndpoints/* actions listed in [Traffic analytics](#traffic-analytics) section. ## Network Watcher -\| Action \| Description \| -\| \| - \| -\| Microsoft.Network/networkWatchers/read \| Get a network watcher \| -\| Microsoft.Network/networkWatchers/write \| Create or update a network watcher \| -\| Microsoft.Network/networkWatchers/delete \| Delete a network watcher \| - -## Flow logs - -\| Action \| Description \| -\| \| - \| -\| Microsoft.Network/networkWatchers/configureFlowLog/action \| Configure a flow Log \| -\| Microsoft.Network/networkWatchers/queryFlowLogStatus/action \| Query status for a flow log \| -Microsoft.Storage/storageAccounts/listServiceSas/Action, </br> Microsoft.Storage/storageAccounts/listAccountSas/Action, <br> Microsoft.Storage/storageAccounts/listKeys/Action \| Fetch shared access signatures (SAS) enabling [secure access to storage account](../storage/common/storage-sas-overview.md) and write to the storage account \| - -## Connection troubleshoot - -\| Action \| Description \| -\| \| - \| -\| Microsoft.Network/networkWatchers/connectivityCheck/action \| Initiate a connection troubleshoot test -\| Microsoft.Network/networkWatchers/queryTroubleshootResult/action \| Query results of a connection troubleshoot test \| -\| Microsoft.Network/networkWatchers/troubleshoot/action \| Run a connection troubleshoot test \| +> [!div class="mx-tableFixed"] +> \| Action \| Description \| +> \| - \| -- \| +> \| Microsoft.Network/networkWatchers/read \| Get a network watcher \| +> \| Microsoft.Network/networkWatchers/write \| Create or update a network watcher \| +> \| Microsoft.Network/networkWatchers/delete \| Delete a network watcher \| ## Connection monitor -\| Action \| Description \| -\| \| - \| + +\| Action \| Description \| +\| - \| -- \| \| Microsoft.Network/networkWatchers/connectionMonitors/start/action \| Start a connection monitor \| \| Microsoft.Network/networkWatchers/connectionMonitors/stop/action \| Stop a connection monitor \| \| Microsoft.Network/networkWatchers/connectionMonitors/query/action \| Query a connection monitor \| Microsoft.Storage/storageAccounts/listServiceSas/Action, </br> Microsoft.Storage \| Microsoft.Network/networkWatchers/connectionMonitors/write \| Create a connection monitor \| \| Microsoft.Network/networkWatchers/connectionMonitors/delete \| Delete a connection monitor \| +## Flow logs + +> [!div class="mx-tableFixed"] +> \| Action \| Description \| +> \| \| -- \| +> \| Microsoft.Network/networkWatchers/configureFlowLog/action \| Configure a flow Log \| +> \| Microsoft.Network/networkWatchers/queryFlowLogStatus/action \| Query status for a flow log \| +Microsoft.Storage/storageAccounts/listServiceSas/Action, </br> Microsoft.Storage/storageAccounts/listAccountSas/Action, <br> Microsoft.Storage/storageAccounts/listKeys/Action \| Fetch shared access signatures (SAS) enabling [secure access to storage account](../storage/common/storage-sas-overview.md?toc=/azure/network-watcher/toc.json) and write to the storage account \| + +## Traffic analytics + +Since traffic analytics is enabled as part of the Flow log resource, the following permissions are required in addition to all the required permissions for [Flow logs](#flow-logs): + +> [!div class="mx-tableFixed"] +> \| Action \| Description \| +> \| - \| -- \| +> \| Microsoft.Network/applicationGateways/read \| Get an application gateway \| +> \| Microsoft.Network/connections/read \| Get VirtualNetworkGatewayConnection \| +> \| Microsoft.Network/loadBalancers/read \| Get a load balancer definition \| +> \| Microsoft.Network/localNetworkGateways/read \| Get LocalNetworkGateway \| +> \| Microsoft.Network/networkInterfaces/read \| Get a network interface definition \| +> \| Microsoft.Network/networkSecurityGroups/read \| Get a network security group definition \| +> \| Microsoft.Network/publicIPAddresses/read \| Get a public IP address definition \| +> \| Microsoft.Network/routeTables/read \| Get a route table definition \| +> \| Microsoft.Network/virtualNetworkGateways/read \| Get a VirtualNetworkGateway \| +> \| Microsoft.Network/virtualNetworks/read \| Get a virtual network definition \| +> \| Microsoft.Network/expressRouteCircuits/read \| Get an ExpressRouteCircuit \| +> \| Microsoft.OperationalInsights/workspaces/* \| Perform actions on a workspace \| +> \| Microsoft.Insights/dataCollectionRules/read <sup>1</sup> \| Read a data collection rule \| +> \| Microsoft.Insights/dataCollectionRules/write <sup>1</sup> \| Create or update a data collection rule \| +> \| Microsoft.Insights/dataCollectionRules/delete <sup>1</sup> \| Delete a data collection rule \| +> \| Microsoft.Insights/dataCollectionEndpoints/read <sup>1</sup> \| Read a data collection endpoint \| +> \| Microsoft.Insights/dataCollectionEndpoints/write <sup>1</sup> \| Create or update a data collection endpoint \| +> \| Microsoft.Insights/dataCollectionEndpoints/delete <sup>1</sup> \| Delete a data collection endpoint \| + +<sup>1</sup> Only required when using traffic analytics to analyze VNet flow logs (preview). For more information, see [Data collection rules in Azure Monitor](../azure-monitor/essentials/data-collection-rule-overview.md?toc=/azure/network-watcher/toc.json) and [Data collection endpoints in Azure Monitor](../azure-monitor/essentials/data-collection-endpoint-overview.md?toc=/azure/network-watcher/toc.json). + +> [!CAUTION] +> Data collection rule and data collection endpoint resources are created and managed by traffic analytics. If you perform any operation on these resources, traffic analytics may not function as expected. + +## Connection troubleshoot + +\| Action \| Description \| +\| - \| -- \| +\| Microsoft.Network/networkWatchers/connectivityCheck/action \| Initiate a connection troubleshoot test \| +\| Microsoft.Network/networkWatchers/queryTroubleshootResult/action \| Query results of a connection troubleshoot test \| +\| Microsoft.Network/networkWatchers/troubleshoot/action \| Run a connection troubleshoot test \| + ## Packet capture -Action \| Description - \| -Microsoft.Network/networkWatchers/packetCaptures/queryStatus/action \| Query the status of a packet capture. -Microsoft.Network/networkWatchers/packetCaptures/stop/action \| Stop a packet capture. -Microsoft.Network/networkWatchers/packetCaptures/read \| Get a packet capture. -Microsoft.Network/networkWatchers/packetCaptures/write \| Create a packet capture. -Microsoft.Network/networkWatchers/packetCaptures/delete \| Delete a packet capture. -Microsoft.Network/networkWatchers/packetCaptures/queryStatus/read \| View the status of a packet capture. +\| Action \| Description \| +\| - \| -- \| +\| Microsoft.Network/networkWatchers/packetCaptures/queryStatus/action \| Query the status of a packet capture. \| +\| Microsoft.Network/networkWatchers/packetCaptures/stop/action \| Stop a packet capture. \| +\| Microsoft.Network/networkWatchers/packetCaptures/read \| Get a packet capture. \| +\| Microsoft.Network/networkWatchers/packetCaptures/write \| Create a packet capture. \| +\| Microsoft.Network/networkWatchers/packetCaptures/delete \| Delete a packet capture. \| +\| Microsoft.Network/networkWatchers/packetCaptures/queryStatus/read \| View the status of a packet capture. \| ## IP flow verify -\| Action \| Description \| -\| \| - \| -\| Microsoft.Network/networkWatchers/ipFlowVerify/action \| Verify an IP flow \| +> [!div class="mx-tableFixed"] +> \| Action \| Description \| +> \| - \| -- \| +> \| Microsoft.Network/networkWatchers/ipFlowVerify/action \| Verify an IP flow \| ## Next hop -\| Action \| Description \| -\| \| - \| -\| Microsoft.Network/networkWatchers/nextHop/action \| Get the next hop from a VM \| +> [!div class="mx-tableFixed"] +> \| Action \| Description \| +> \| - \| -- \| +> \| Microsoft.Network/networkWatchers/nextHop/action \| Get the next hop from a VM \| ## Network security group view -\| Action \| Description \| -\| \| - \| -\| Microsoft.Network/networkWatchers/securityGroupView/action \| View security groups \| +> [!div class="mx-tableFixed"] +> \| Action \| Description \| +> \| - \| -- \| +> \| Microsoft.Network/networkWatchers/securityGroupView/action \| View security groups \| ## Topology -\| Action \| Description \| -\| \| - \| -\| Microsoft.Network/networkWatchers/topology/action \| Get topology \| -\| Microsoft.Network/networkWatchers/topology/read \| Same as above \| +> [!div class="mx-tableFixed"] +> \| Action \| Description \| +> \| - \| -- \| +> \| Microsoft.Network/networkWatchers/topology/action \| Get topology \| +> \| Microsoft.Network/networkWatchers/topology/read \| Same as above \| ## Reachability report -\| Action \| Description \| -\| \| - \| +\| Action \| Description \| +\| - \| -- \| \| Microsoft.Network/networkWatchers/azureReachabilityReport/action \| Get an Azure reachability report \| Microsoft.Network/networkWatchers/packetCaptures/queryStatus/read \| View the sta Network Watcher capabilities also require the following actions: -\| Action(s) \| Description \| -\| \| - \| -\| Microsoft.Authorization/\/Read \| Fetch Azure role assignments and policy definitions \| -\| Microsoft.Resources/subscriptions/resourceGroups/Read \| Enumerate all the resource groups in a subscription \| -\| Microsoft.Storage/storageAccounts/Read \| Get the properties for the specified storage account \| -\| Microsoft.Storage/storageAccounts/listServiceSas/Action, </br> Microsoft.Storage/storageAccounts/listAccountSas/Action, <br> Microsoft.Storage/storageAccounts/listKeys/Action \| Used to fetch shared access signatures (SAS) enabling [secure access to storage account](../storage/common/storage-sas-overview.md) and write to the storage account \| -\| Microsoft.Compute/virtualMachines/Read, </br> Microsoft.Compute/virtualMachines/Write\| Log in to the VM, do a packet capture and upload it to storage account \| -\| Microsoft.Compute/virtualMachines/extensions/Read, </br> Microsoft.Compute/virtualMachines/extensions/Write \| Check if Network Watcher extension is present, and install if necessary \| -\| Microsoft.Compute/virtualMachineScaleSets/Read, </br> Microsoft.Compute/virtualMachineScaleSets/Write \| Access virtual machine scale sets, do packet captures and upload them to storage account \| -\| Microsoft.Compute/virtualMachineScaleSets/extensions/Read, </br> Microsoft.Compute/virtualMachineScaleSets/extensions/Write\| Check if Network Watcher extension is present, and install if necessary \| -\| Microsoft.Insights/alertRules/ \| Set up metric alerts \| -\| Microsoft.Support/* \| Create and update support tickets from Network Watcher \| +> [!div class="mx-tableFixed"] +> \| Action(s) \| Description \| +> \| - \| -- \| +> \| Microsoft.Authorization/\/Read \| Fetch Azure role assignments and policy definitions \| +> \| Microsoft.Resources/subscriptions/resourceGroups/Read \| Enumerate all the resource groups in a subscription \| +> \| Microsoft.Storage/storageAccounts/Read \| Get the properties for the specified storage account \| +> \| Microsoft.Storage/storageAccounts/listServiceSas/Action, </br> Microsoft.Storage/storageAccounts/listAccountSas/Action, <br> Microsoft.Storage/storageAccounts/listKeys/Action \| Fetch shared access signatures (SAS) enabling [secure access to storage account](../storage/common/storage-sas-overview.md?toc=/azure/network-watcher/toc.json) and write to the storage account \| +> \| Microsoft.Compute/virtualMachines/Read, </br> Microsoft.Compute/virtualMachines/Write\| Log in to the VM, do a packet capture and upload it to storage account \| +> \| Microsoft.Compute/virtualMachines/extensions/Read, </br> Microsoft.Compute/virtualMachines/extensions/Write \| Check if Network Watcher extension is present, and install if necessary \| +> \| Microsoft.Compute/virtualMachineScaleSets/Read, </br> Microsoft.Compute/virtualMachineScaleSets/Write \| Access virtual machine scale sets, do packet captures and upload them to storage account \| +> \| Microsoft.Compute/virtualMachineScaleSets/extensions/Read, </br> Microsoft.Compute/virtualMachineScaleSets/extensions/Write\| Check if Network Watcher extension is present, and install if necessary \| +> \| Microsoft.Insights/alertRules/ \| Set up metric alerts \| +> \| Microsoft.Support/* \| Create and update support tickets from Network Watcher \|
openshift	Configure Azure Ad Ui	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/openshift/configure-azure-ad-ui.md	You can find the cluster console URL by running the following command, which wil Launch the console URL in a browser and login using the `kubeadmin` credentials. -Navigate to Administration, click on Cluster Settings, then select the Global Configuration tab. Scroll to select OAuth. +Navigate to Administration, click on Cluster Settings, then select the Configuration tab. Scroll to select OAuth. Scroll down to select Add under Identity Providers and select OpenID Connect. ![Select OpenID Connect from the Identity Providers dropdown](media/aro4-oauth-idpdrop.png)
peering-service	Azure Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/peering-service/azure-portal.md	Previously updated : 07/23/2023 Last updated : 10/09/2023+ +#CustomerIntent: As an administrator, I want learn how to manage a Peering Service connection using Azure portal so that I can create, change, or delete a Peering Service connection when needed. # Create, change, or delete a Peering Service connection using the Azure portal Sign in to the [Azure portal](https://portal.azure.com). \| Name \| Enter myPeeringService. \| :::image type="content" source="./media/azure-portal/peering-service-basics.png" alt-text="Screenshot of the Basics tab of Create a peering service connection in Azure portal.":::- - > [!NOTE] - > Once a Peering Service resource is created under a certain subscription and resource group, it cannot be moved to another subscription or resource group. 1. Select Next: Configuration. Review the [Technical requirements for Peering Service prefixes](../internet-pee If you would like to change the primary or backup peering location in your Peering Service, reach out to peeringservice@microsoft.com to request this. Give the resource ID of the peering service to modify, and the new primary and backup locations you'd like to be configured. -## Next steps +## Related content - To learn more about Peering Service connections, see [Peering Service connection](connection.md). - To learn more about Peering Service connection telemetry, see [Access Peering Service connection telemetry](connection-telemetry.md).
postgresql	Concepts Extensions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-extensions.md	After extensions are allow-listed and loaded, these must be installed in your da Azure Database for PostgreSQL supports a subset of key PostgreSQL extensions as listed below. This information is also available by running `SHOW azure.extensions;`. Extensions not listed in this document aren't supported on Azure Database for PostgreSQL - Flexible Server. You can't create or load your own extension in Azure Database for PostgreSQL.+ +## Postgres 15 extensions. + +The following extensions are available in Azure Database for PostgreSQL - Flexible Servers, which have Postgres version 15. ++ +> [!div class="mx-tableFixed"] +> \| Extension\| Extension version \| Description \| +> \|\|\|\| +> \|[address_standardizer](http://postgis.net/docs/manual-2.5/Address_Standardizer.html) \| 3.1.1 \| Used to parse an address into constituent elements. \| +> \|[address_standardizer_data_us](http://postgis.net/docs/manual-2.5/Address_Standardizer.html) \| 3.1.1 \| Address Standardizer US dataset example\| +> \|[amcheck](https://www.postgresql.org/docs/13/amcheck.html) \| 1.2 \| functions for verifying relation integrity\| +> \|[bloom](https://www.postgresql.org/docs/13/bloom.html) \| 1.0 \| bloom access method - signature file based index\| +> \|[btree_gin](https://www.postgresql.org/docs/13/btree-gin.html) \| 1.3 \| support for indexing common datatypes in GIN\| +> \|[btree_gist](https://www.postgresql.org/docs/13/btree-gist.html) \| 1.5 \| support for indexing common datatypes in GiST\| +> \|[citext](https://www.postgresql.org/docs/13/citext.html) \| 1.6 \| data type for case-insensitive character strings\| +> \|[cube](https://www.postgresql.org/docs/13/cube.html) \| 1.4 \| data type for multidimensional cubes\| +> \|[dblink](https://www.postgresql.org/docs/13/dblink.html) \| 1.2 \| connect to other PostgreSQL databases from within a database\| +> \|[dict_int](https://www.postgresql.org/docs/13/dict-int.html) \| 1.0 \| text search dictionary template for integers\| +> \|[dict_xsyn](https://www.postgresql.org/docs/13/dict-xsyn.html) \| 1.0 \| text search dictionary template for extended synonym processing\| +> \|[earthdistance](https://www.postgresql.org/docs/13/earthdistance.html) \| 1.1 \| calculate great-circle distances on the surface of the Earth\| +> \|[fuzzystrmatch](https://www.postgresql.org/docs/13/fuzzystrmatch.html) \| 1.1 \| determine similarities and distance between strings\| +>\|[hypopg](https://github.com/HypoPG/hypopg) \| 1.3.1 \| extension adding support for hypothetical indexes \| +> \|[hstore](https://www.postgresql.org/docs/13/hstore.html) \| 1.7 \| data type for storing sets of (key, value) pairs\| +> \|[intagg](https://www.postgresql.org/docs/13/intagg.html) \| 1.1 \| integer aggregator and enumerator. (Obsolete)\| +> \|[intarray](https://www.postgresql.org/docs/13/intarray.html) \| 1.3 \| functions, operators, and index support for 1-D arrays of integers\| +> \|[isn](https://www.postgresql.org/docs/13/isn.html) \| 1.2 \| data types for international product numbering standards\| +> \|[lo](https://www.postgresql.org/docs/13/lo.html) \| 1.1 \| large object maintenance \| +> \|[ltree](https://www.postgresql.org/docs/13/ltree.html) \| 1.2 \| data type for hierarchical tree-like structures\| + > \|[orafce](https://github.com/orafce/orafce) \| 3.24 \|implements in Postgres some of the functions from the Oracle database that are missing\| +> \|[pageinspect](https://www.postgresql.org/docs/13/pageinspect.html) \| 1.8 \| inspect the contents of database pages at a low level\| +> \|[pg_buffercache](https://www.postgresql.org/docs/13/pgbuffercache.html) \| 1.3 \| examine the shared buffer cache\| +> \|[pg_cron](https://github.com/citusdata/pg_cron) \| 1.4 \| Job scheduler for PostgreSQL\| +> \|[pg_freespacemap](https://www.postgresql.org/docs/13/pgfreespacemap.html) \| 1.2 \| examine the free space map (FSM)\| +> \|[pg_partman](https://github.com/pgpartman/pg_partman) \| 4.7.1 \| Extension to manage partitioned tables by time or ID \| +> \|[pg_prewarm](https://www.postgresql.org/docs/13/pgprewarm.html) \| 1.2 \| prewarm relation data\| +> \|[pg_repack](https://reorg.github.io/pg_repack/) \| 1.4.7 \| lets you remove bloat from tables and indexes\| +> \|[pg_stat_statements](https://www.postgresql.org/docs/13/pgstatstatements.html) \| 1.8 \| track execution statistics of all SQL statements executed\| +> \|[pg_trgm](https://www.postgresql.org/docs/13/pgtrgm.html) \| 1.5 \| text similarity measurement and index searching based on trigrams\| +> \|[pg_hint_plan](https://github.com/ossc-db/pg_hint_plan) \| 1.4 \| makes it possible to tweak PostgreSQL execution plans using so-called "hints" in SQL comments\| +> \|[pg_visibility](https://www.postgresql.org/docs/13/pgvisibility.html) \| 1.2 \| examine the visibility map (VM) and page-level visibility info\| +> \|[pgaudit](https://www.pgaudit.org/) \| 1.7 \| provides auditing functionality\| +> \|[pgcrypto](https://www.postgresql.org/docs/13/pgcrypto.html) \| 1.3 \| cryptographic functions\| +> \|[pglogical](https://github.com/2ndQuadrant/pglogical) \| 2.3.2 \| Logical streaming replication \| +> \|[pgrouting](https://pgrouting.org/) \| 3.3.0 \| geospatial database to provide geospatial routing\| +> \|[pgrowlocks](https://www.postgresql.org/docs/13/pgrowlocks.html) \| 1.2 \| show row-level locking information\| +> \|[pgstattuple](https://www.postgresql.org/docs/13/pgstattuple.html) \| 1.5 \| show tuple-level statistics\| +> \|[pgvector](https://github.com/pgvector/pgvector) \| 0.4.0 \| Open-source vector similarity search for Postgres\| +> \|[plpgsql](https://www.postgresql.org/docs/13/plpgsql.html) \| 1.0 \| PL/pgSQL procedural language\| +> \|[plv8](https://plv8.github.io/) \| 3.0.0 \| Trusted JavaScript language extension\| +> \|[postgis](https://www.postgis.net/) \| 3.2.0 \| PostGIS geometry, geography \| +> \|[postgis_raster](https://www.postgis.net/) \| 3.2.0 \| PostGIS raster types and functions\| +> \|[postgis_sfcgal](https://www.postgis.net/) \| 3.2.0 \| PostGIS SFCGAL functions\| +> \|[postgis_tiger_geocoder](https://www.postgis.net/) \| 3.2.0 \| PostGIS tiger geocoder and reverse geocoder\| +> \|[postgis_topology](https://postgis.net/docs/Topology.html) \| 3.2.0 \| PostGIS topology spatial types and functions\| +> \|[postgres_fdw](https://www.postgresql.org/docs/13/postgres-fdw.html) \| 1.0 \| foreign-data wrapper for remote PostgreSQL servers\| +> \|[sslinfo](https://www.postgresql.org/docs/13/sslinfo.html) \| 1.2 \| information about SSL certificates\| +> \|[semver](https://pgxn.org/dist/semver/doc/semver.html) \| 0.32.0 \| semantic version data type\| +> \|[tablefunc](https://www.postgresql.org/docs/11/tablefunc.html) \| 1.0 \| functions that manipulate whole tables, including crosstab\| +> \|[timescaledb](https://github.com/timescale/timescaledb) \| 2.5.1 \| Open-source relational database for time-series and analytics\| +> \|[tsm_system_rows](https://www.postgresql.org/docs/13/tsm-system-rows.html) \| 1.0 \| TABLESAMPLE method which accepts number of rows as a limit\| +> \|[tsm_system_time](https://www.postgresql.org/docs/13/tsm-system-time.html) \| 1.0 \| TABLESAMPLE method which accepts time in milliseconds as a limit\| +> \|[unaccent](https://www.postgresql.org/docs/13/unaccent.html) \| 1.1 \| text search dictionary that removes accents\| +> \|[uuid-ossp](https://www.postgresql.org/docs/13/uuid-ossp.html) \| 1.1 \| generate universally unique identifiers (UUIDs)\| ## Postgres 14 extensions The following extensions are available in Azure Database for PostgreSQL - Flexible Servers, which have Postgres version 14.
postgresql	Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/overview.md	One advantage of running your workload in Azure is global reach. The flexible se \| North Central US \| :heavy_check_mark: \| :x: \| :heavy_check_mark: \| :heavy_check_mark: \| \| North Europe \| :heavy_check_mark: (v3/v4 only) \| :heavy_check_mark: \| :heavy_check_mark: \| :heavy_check_mark: \| \| Norway East \| :heavy_check_mark: \| :heavy_check_mark: \| :heavy_check_mark: \| :x: \| +\| Norway West \| :heavy_check_mark: (v3/v4 only) \| :x: \| :heavy_check_mark: \| :x: \| \| Qatar Central \| :heavy_check_mark: (v3/v4 only) \| :heavy_check_mark: \| :heavy_check_mark: \| :x: \| \| South Africa North \| :heavy_check_mark: \| :heavy_check_mark: \| :heavy_check_mark: \| :heavy_check_mark: \| \| South Central US \| :heavy_check_mark: (v3/v4 only) \| :heavy_check_mark: \| :heavy_check_mark: \| :heavy_check_mark: \|
postgresql	Release Notes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/release-notes.md	This page provides latest news and updates regarding feature additions, engine v * General availability of [Query Performance Insight](./concepts-query-performance-insight.md) for Azure Database for PostgreSQL ΓÇô Flexible Server. * General availability of [Major Version Upgrade](concepts-major-version-upgrade.md) for Azure Database for PostgreSQL ΓÇô Flexible Server. * General availability of [Restore a dropped server](how-to-restore-dropped-server.md) for Azure Database for PostgreSQL ΓÇô Flexible Server. -* Public preview of [Storage auto-grow](./concepts-compute-storage.md#storage-auto-grow-preview) for Azure Database for PostgreSQL ΓÇô Flexible Server. +* Public preview of [Storage auto-grow](./concepts-compute-storage.md) for Azure Database for PostgreSQL ΓÇô Flexible Server. ## Release: May 2023 * Public preview of [Database availability metric](./concepts-monitoring.md#database-availability-metric) for Azure Database for PostgreSQL ΓÇô Flexible Server.
private-5g-core	Collect Required Information For A Site	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/private-5g-core/collect-required-information-for-a-site.md	Collect all the values in the following table for the packet core instance that \|\|\| \|The core technology type the packet core instance should support (5G or 4G). \|Technology type\| \| The Azure Stack Edge resource representing the Azure Stack Edge Pro device in the site. You created this resource as part of the steps in [Order and set up your Azure Stack Edge Pro device(s)](complete-private-mobile-network-prerequisites.md#order-and-set-up-your-azure-stack-edge-pro-devices).</br></br> If you're going to create your site using the Azure portal, collect the name of the Azure Stack Edge resource.</br></br> If you're going to create your site using an ARM template, collect the full resource ID of the Azure Stack Edge resource. You can do this by navigating to the Azure Stack Edge resource, selecting JSON View and copying the contents of the Resource ID field. \| Azure Stack Edge device \| - \|The custom location that targets the Azure Kubernetes Service on Azure Stack HCI (AKS-HCI) cluster on the Azure Stack Edge Pro device in the site. You commissioned the AKS-HCI cluster as part of the steps in [Order and set up your Azure Stack Edge Pro device(s)](complete-private-mobile-network-prerequisites.md#order-and-set-up-your-azure-stack-edge-pro-devices).</br></br> If you're going to create your site using the Azure portal, collect the name of the custom location.</br></br> If you're going to create your site using an ARM template, collect the full resource ID of the custom location. You can do this by navigating to the Custom location resource, selecting JSON View and copying the contents of the Resource ID field.\|Custom location\| + \|The custom location that targets the Azure Kubernetes Service on Azure Stack HCI (AKS-HCI) cluster on the Azure Stack Edge Pro device in the site. You commissioned the AKS-HCI cluster as part of the steps in [Commission the AKS cluster](commission-cluster.md).</br></br> If you're going to create your site using the Azure portal, collect the name of the custom location.</br></br> If you're going to create your site using an ARM template, collect the full resource ID of the custom location. You can do this by navigating to the Custom location resource, selecting JSON View and copying the contents of the Resource ID field.\|Custom location\| ## Collect access network values
private-5g-core	Commission Cluster	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/private-5g-core/commission-cluster.md	In the local Azure Stack Edge UI, go to the Kubernetes (Preview) page. You'l 1. Under Compute virtual switch, select Modify. 1. Select the vswitch with compute intent (for example, vswitch-port2) 1. Enter six IP addresses in a range for the node IP addresses on the management network. - 1. Enter one IP address in a range for the service IP address, also on the management network. + 1. Enter one IP address in a range for the service IP address, also on the management network. This will be used for accessing local monitoring tools for the packet core instance. 1. Select Modify at the bottom of the panel to save the configuration. 1. Under Virtual network, select a virtual network, from N2, N3, N6-DNX (where X is the DN number 1-10). In the side panel: 1. Enable the virtual network for Kubernetes and add a pool of IP addresses. Add a range of one IP address for the appropriate address (N2, N3, or N6-DNX as collected earlier). For example, 10.10.10.20-10.10.10.20.
private-5g-core	Complete Private Mobile Network Prerequisites	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/private-5g-core/complete-private-mobile-network-prerequisites.md	In this how-to guide, you'll carry out each of the tasks you need to complete be > [!TIP] > [Private mobile network design requirements](private-mobile-network-design-requirements.md) contains the full network design requirements for a customized network. +## Tools and access + +To deploy your private mobile network using Azure Private 5G Core, you will need the following: + +- A Windows PC with internet access +- A Windows Administrator account on that PC +- [Azure CLI](/cli/azure/install-azure-cli) +- [PowerShell](/powershell/scripting/install/installing-powershell-on-windows) +- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl-windows/) + ## Get access to Azure Private 5G Core for your Azure subscription Contact your trials engineer and ask them to register your Azure subscription for access to Azure Private 5G Core. If you don't already have a trials engineer and are interested in trialing Azure Private 5G Core, contact your Microsoft account team, or express your interest through the [partner registration form](https://forms.office.com/r/4Q1yNRakXe). Depending on your networking requirements (for example, if a limited set of subn - Default gateway. - One IP address for the management port (port 2) on the Azure Stack Edge Pro 2 device. - Six sequential IP addresses for the Azure Kubernetes Service on Azure Stack HCI (AKS-HCI) cluster nodes.-- One IP address for accessing local monitoring tools for the packet core instance. +- One service IP address for accessing local monitoring tools for the packet core instance. :::zone-end :::zone pivot="ase-pro-gpu" Depending on your networking requirements (for example, if a limited set of subn - One IP address for the management port - You'll choose a port between 2 and 4 to use as the Azure Stack Edge Pro GPU device's management port as part of [setting up your Azure Stack Edge Pro device](#order-and-set-up-your-azure-stack-edge-pro-devices).* - Six sequential IP addresses for the Azure Kubernetes Service on Azure Stack HCI (AKS-HCI) cluster nodes.-- One IP address for accessing local monitoring tools for the packet core instance. +- One service IP address for accessing local monitoring tools for the packet core instance. :::zone-end
role-based-access-control	Role Assignments Powershell	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/role-based-access-control/role-assignments-powershell.md	New-AzRoleAssignment -ObjectId <objectId> ` #### Assign a role for all blob containers in a storage account resource scope -Assigns the [Storage Blob Data Contributor](built-in-roles.md#storage-blob-data-contributor) role to a service principal with object ID 55555555-5555-5555-5555-555555555555 at a resource scope for a storage account named storage12345. +Assigns the [Storage Blob Data Contributor](built-in-roles.md#storage-blob-data-contributor) role to a service principal with object ID 55555555-5555-5555-5555-555555555555 and Application ID 66666666-6666-6666-6666-666666666666 at a resource scope for a storage account named storage12345. ```azurepowershell -PS C:\> New-AzRoleAssignment -ObjectId 55555555-5555-5555-5555-555555555555 ` +PS C:\> New-AzRoleAssignment -ApplicationId 66666666-6666-6666-6666-666666666666 ` -RoleDefinitionName "Storage Blob Data Contributor" ` -Scope "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Example-Storage-rg/providers/Microsoft.Storage/storageAccounts/storage12345" CanDelegate : False #### Assign a role for a specific blob container resource scope -Assigns the [Storage Blob Data Contributor](built-in-roles.md#storage-blob-data-contributor) role to a service principal with object ID 55555555-5555-5555-5555-555555555555 at a resource scope for a blob container named blob-container-01. +Assigns the [Storage Blob Data Contributor](built-in-roles.md#storage-blob-data-contributor) role to a service principal with object ID 55555555-5555-5555-5555-555555555555 and Application ID 66666666-6666-6666-6666-666666666666 at a resource scope for a blob container named blob-container-01. ```azurepowershell -PS C:\> New-AzRoleAssignment -ObjectId 55555555-5555-5555-5555-555555555555 ` +PS C:\> New-AzRoleAssignment -ApplicationId 66666666-6666-6666-6666-666666666666 ` -RoleDefinitionName "Storage Blob Data Contributor" ` -Scope "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Example-Storage-rg/providers/Microsoft.Storage/storageAccounts/storage12345/blobServices/default/containers/blob-container-01"
sap	High Availability Guide Rhel Pacemaker	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sap/workloads/high-availability-guide-rhel-pacemaker.md	vm-windows Previously updated : 08/23/2023 Last updated : 10/09/2023 op monitor interval=3600 For RHEL 8.x/9.x, use the following command to configure the fence device: ```bash -# If the version of pacemaker is or greater than 2.0.4-6.el8, then run following command (see Tip box below for details): +# Run following command if you are setting up fence agent on (two-node cluster and pacemaker version greater than 2.0.4-6.el8) OR (HANA scale out) sudo pcs stonith create rsc_st_azure fence_azure_arm msi=true resourceGroup="resource group" \ subscriptionId="subscription id" pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \ power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 \ op monitor interval=3600 -# If the version of pacemaker is less than 2.0.4-6.el8, then run following command (see Tip box below for details): +# Run following command if you are setting up fence agent on (two-node cluster and pacemaker version less than 2.0.4-6.el8) sudo pcs stonith create rsc_st_azure fence_azure_arm msi=true resourceGroup="resource group" \ subscriptionId="subscription id" pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \ power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 pcmk_delay_max=15 \ op monitor interval=3600 For RHEL 8.x/9.x, use the following command to configure the fence device: ```bash -# If the version of pacemaker is or greater than 2.0.4-6.el8, then run following command (see Tip box below for details): +# Run following command if you are setting up fence agent on (two-node cluster and pacemaker version greater than 2.0.4-6.el8) OR (HANA scale out) sudo pcs stonith create rsc_st_azure fence_azure_arm username="login ID" password="password" \ resourceGroup="resource group" tenantId="tenant ID" subscriptionId="subscription id" \ pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \ power_timeout=240 pcmk_reboot_timeout=900 pcmk_monitor_timeout=120 pcmk_monitor_retries=4 pcmk_action_limit=3 \ op monitor interval=3600- -# If the version of pacemaker is less than 2.0.4-6.el8, then run following command (see Tip box below for details): +# Run following command if you are setting up fence agent on (two-node cluster and pacemaker version less than 2.0.4-6.el8) sudo pcs stonith create rsc_st_azure fence_azure_arm username="login ID" password="password" \ resourceGroup="resource group" tenantId="tenant ID" subscriptionId="subscription id" \ pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \ op monitor interval=3600 If you're using fencing device, based on service principal configuration, read [Change from SPN to MSI for Pacemaker clusters using Azure fencing](https://techcommunity.microsoft.com/t5/running-sap-applications-on-the/sap-on-azure-high-availability-change-from-spn-to-msi-for/ba-p/3609278) and learn how to convert to managed identity configuration. > [!TIP] -> 'value1' and 'value2' are integer values with a unit of seconds. Replace the values 'value1' and 'value2' with appropriate integer values that are at least 5 seconds apart. For example:`pcmk_delay_base="prod-cl1-0:0;prod-cl1-1:10"`. Only configure the `pcmk_delay_max` attribute in two node clusters, with pacemaker version less than 2.0.4-6.el8. For pacemaker versions greater than 2.0.4-6.el8, use `pcmk_delay_base`.<br> For more information on preventing fence races in a two node Pacemaker cluster, see [Delaying fencing in a two node cluster to prevent fence races of "fence death" scenarios](https://access.redhat.com/solutions/54829). +> +> * To avoid fence races within a two-node pacemaker cluster, you can configure "priority-fencing-delay" cluster property. This property introduces additional delay in fencing a node that has higher total resource priority when a split-brain scenario occurs. For additional details, see [Can Pacemaker fence the cluster node with the fewest running resources?](https://access.redhat.com/solutions/5110521) +> * The property "priority-fencing-delay" is applicable for pacemaker version 2.0.4-6.el8 or higher and on two-node cluster. If you configure the "priority-fencing-delay" cluster property, there is no need to set the "pcmk_delay_max" property. But if the pacemaker is less than 2.0.4-6.el8, you should set "pcmk_delay_max" property. +> * The instruction on setting "priority-fencing-delay" cluster property can be found in respective SAP ASCS/ERS and SAP HANA scale-up high availability document. > [!IMPORTANT] > The monitoring and fencing operations are deserialized. As a result, if there is a longer running monitoring operation and simultaneous fencing event, there is no delay to the cluster failover, due to the already running monitoring operation. Azure offers [scheduled events](../../virtual-machines/linux/scheduled-events.md ``` Minimum version requirements: - - RHEL 8.4: `resource-agents-4.1.1-90.13` - - RHEL 8.6: `resource-agents-4.9.0-16.9` - - RHEL 8.8 and newer: `resource-agents-4.9.0-40.1` - - RHEL 9.0 and newer: `resource-agents-cloud-4.10.0-34.2` + * RHEL 8.4: `resource-agents-4.1.1-90.13` + * RHEL 8.6: `resource-agents-4.9.0-16.9` + * RHEL 8.8 and newer: `resource-agents-4.9.0-40.1` + * RHEL 9.0 and newer: `resource-agents-cloud-4.10.0-34.2` 2. [1] Configure the resources in Pacemaker.
sap	High Availability Guide Suse Pacemaker	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sap/workloads/high-availability-guide-suse-pacemaker.md	Previously updated : 10/03/2023 Last updated : 10/09/2023 Make sure to assign the custom role to the service principal at all VM (cluster ### Create a fencing device on the Pacemaker cluster +> [!TIP] +> +> - To avoid fence races within a two-node pacemaker cluster, you can configure additional "priority-fencing-delay" cluster property. This property introduces additional delay in fencing a node that has higher total resource priority when a split-brain scenario occurs. For additional details, see [SUSE Linux Enterprise Server high availability extension administration guide](https://documentation.suse.com/sle-ha/15-SP3/single-html/SLE-HA-administration/#pro-ha-storage-protect-fencing). +> - The instruction on setting "priority-fencing-delay" cluster property can be found in respective SAP ASCS/ERS (applicable only on ENSA2) and SAP HANA scale-up high availability document. + 1. [1] If you're using an SBD device (iSCSI target server or Azure shared disk) as a fencing device, run the following commands. Enable the use of a fencing device, and set the fence delay. ```bash
search	Hybrid Search Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/search/hybrid-search-overview.md	Last updated 09/27/2023 Hybrid search is a combination of full text and vector queries that execute against a search index that contains both searchable plain text content and generated embeddings. For query purposes, hybrid search is: -+ A single query request that includes `search` and `vectors` parameters, multiple vector queries, or one vector query targeting multiple fields -+ Parallel query execution -+ Merged results in the query response, scored using [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) ++ A single query request that includes both `search` and `vectors` query parameters++ Executing in parallel++ With merged results in the query response, scored using [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) This article explains the concepts, benefits, and limitations of hybrid search.
search	Hybrid Search Ranking	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/search/hybrid-search-ranking.md	Last updated 09/27/2023 > [!IMPORTANT] > Hybrid search uses the [vector features](vector-search-overview.md) currently in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). -For hybrid search scoring, Cognitive Search uses the Reciprocal Rank Fusion (RRF) algorithm. RRF combines the results of different search methods - such as vector search and full text search or multiple vector queries executing in parallel - to produce a single relevance score. RRF is based on the concept of reciprocal rank, which is the inverse of the rank of the first relevant document in a list of search results.┬á +Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores from multiple, previously ranked results to produce a unified result set. In Azure Cognitive Search, RRF is used whenever there are two or more queries that execute in parallel. Each query produces a ranked result set, and RRF is used to merge and homogenize the rankings into a single result set, returned in the query response. Examples of scenarios where RRF is required include [hybrid search](hybrid-search-overview.md) and multiple vector queries executing concurrently. -The goal of the technique is to take into account the position of the items in the original rankings, and give higher importance to items that are ranked higher in multiple lists. This can help improve the overall quality and reliability of the final ranking, making it more useful for the task of fusing multiple ordered search results. - -In Azure Cognitive Search, RRF is used whenever there are two or more queries that execute in parallel. Each query produces a ranked result set, and RRF is used to merge and homogenize the rankings into a single result set, returned in the query response. +RRF is based on the concept of reciprocal rank, which is the inverse of the rank of the first relevant document in a list of search results.┬áThe goal of the technique is to take into account the position of the items in the original rankings, and give higher importance to items that are ranked higher in multiple lists. This can help improve the overall quality and reliability of the final ranking, making it more useful for the task of fusing multiple ordered search results. ## How RRF ranking works Semantic ranking doesn't participate in RRF. Its score (`@search.rerankerScore`) ## Number of ranked results in a hybrid query response -By default, if you aren't using pagination, the search engine returns the top 50 highest ranking matches for full text search, and it returns `k` matches for vector search. In a hybrid query, `top` determines the number of results in the response. Based on defaults, the top 50 highest ranked matches of the unified result set are returned. Full text search is subject to a maximum limit of 1,000 matches (see [API response limits](search-limits-quotas-capacity.md#api-response-limits)). Once 1,000 matches are found, the search engine no longer looks for more. +By default, if you aren't using pagination, the search engine returns the top 50 highest ranking matches for full text search, and the most similar `k` matches for vector search. In a hybrid query, `top` determines the number of results in the response. Based on defaults, the top 50 highest ranked matches of the unified result set are returned. + +Often, the search engine finds more results than `top` and `k`. To return more results, use the paging parameters `top`, `skip`, and `next`. Paging is how you determine the number of results on each logical page and navigate through the full payload. + +Full text search is subject to a maximum limit of 1,000 matches (see [API response limits](search-limits-quotas-capacity.md#api-response-limits)). Once 1,000 matches are found, the search engine no longer looks for more. -You can use `top`, `skip`, and `next` for paginated results. Paging results is how you determine the number of results on each logical page and navigate through the full payload. For more information, see [How to work with search results](search-pagination-page-layout.md). +For more information, see [How to work with search results](search-pagination-page-layout.md). ## See also
search	Index Similarity And Scoring	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/search/index-similarity-and-scoring.md	The following video segment fast-forwards to an explanation of the generally ava Relevance scoring refers to the computation of a search score (@search.score) that serves as an indicator of an item's relevance in the context of the current query. The range is unbounded. However, the higher the score, the more relevant the item. -By default, the top 50 highest scoring matches are returned in the response, but you can use the $top parameter to return a smaller or larger number of items (up to 1000 in a single response), and $skip to get the next set of results. - The search score is computed based on statistical properties of the string input and the query itself. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](/rest/api/searchservice/search-documents#query-parameters)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as TF-IDF or term frequency-inverse document frequency. Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you're using the free service or a billable service with multiple replicas. Given two items with an identical score, there's no guarantee that one appears first. For a query that targets the "description" and "title" fields, a response that i You can consume these data points in [custom scoring solutions](https://github.com/Azure-Samples/search-ranking-tutorial) or use the information to debug search relevance problems. +## Number of ranked results in a full text query response + +By default, if you aren't using pagination, the search engine returns the top 50 highest ranking matches for full text search. You can use the `top` parameter to return a smaller or larger number of items (up to 1000 in a single response). Full text search is subject to a maximum limit of 1,000 matches (see [API response limits](search-limits-quotas-capacity.md#api-response-limits)). Once 1,000 matches are found, the search engine no longer looks for more. + +To return more or less results, use the paging parameters `top`, `skip`, and `next`. Paging is how you determine the number of results on each logical page and navigate through the full payload. For more information, see [How to work with search results](search-pagination-page-layout.md). + ## See also + [Scoring Profiles](index-add-scoring-profiles.md)
site-recovery	Azure To Azure Common Questions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/site-recovery/azure-to-azure-common-questions.md	description: This article answers common questions about Azure VM disaster recov Previously updated : 04/28/2022 Last updated : 10/06/2023 Site Recovery tries to provide the IP address at the time of failover. If anothe ### What's the Latest recovery point? -The Latest (lowest RPO) recovery point option does the following: - -1. It first processes all the data that has been sent to Site Recovery. -2. After the service processes the data, it creates a recovery point for each VM, before failing over to the VM. This option provides the lowest recovery point objective (RPO). -3. The VM created after failover has all the data replicated to Site Recovery, from when the failover was triggered. +The Latest (lowest RPO) recovery point option provides the lowest recovery point objective (RPO). It first processes all the data that has been sent to Site Recovery service, to create a recovery point for each VM, before failing over to it. It initially attempts to process and apply all data sent to Site Recovery service in the target location and create a recovery point using the processed data. However, if at the time failover was triggered, there is no data uploaded to Site Recovery service waiting to be processed, Azure Site Recovery will not perform any processing and hence, won't create a new recovery point. In this scenario, it will instead failover using the previously processed recovery point only. ### Do latest recovery points impact failover RTO?
site-recovery	Azure To Azure How To Enable Replication Cmk Disks	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/site-recovery/azure-to-azure-how-to-enable-replication-cmk-disks.md	Previously updated : 10/19/2022 Last updated : 10/09/2023 As an example, the primary Azure region is East Asia, and the secondary region i ## FAQs -* I have enabled CMK on an existing replicated item, how can I ensure that CMK is applied on the target region as well? +* I have enabled CMK on an existing replicated item, how can I ensure that CMK is applied on the target region as well? You can find out the name of the replica managed disk (created by Azure Site Recovery in the target region) and attach DES to this replica disk. However, you will not be able to see the DES details in the Disks blade once you attach it. Alternatively, you can choose to disable the replication of the VM and enable it again. It will ensure you see DES and key vault details in the Disks blade for the replicated item. -* I have added a new CMK enabled disk to the replicated item. How can I replicate this disk with Azure Site Recovery? +* I have added a new CMK enabled disk to the replicated item. How can I replicate this disk with Azure Site Recovery? + + You can add a new CMK enabled disk to an existing replicated item using PowerShell. Find the code snippet for guidance: + + ```powershell + #set vaultname and resource group name for the vault. + $vaultname="RSVNAME" + $vaultrgname="RSVRGNAME" + + #set VMName + $VMName = "VMNAME" + + #get the vault object + $vault = Get-AzRecoveryServicesVault -Name $vaultname -ResourceGroupName $vaultrgname + + #set job context to this vault + $vault \| Set-AzRecoveryServicesAsrVaultContext + + ============= + + #set resource id of disk encryption set + $diskencryptionset = "RESOURCEIDOFTHEDISKENCRYPTIONSET" + + #set resource id of cache storage account + $primaryStorageAccount = "RESOURCEIDOFCACHESTORAGEACCOUNT" + + #set resource id of recovery resource group + $RecoveryResourceGroup = "RESOURCEIDOFRG" + + #set resource id of disk to be replicated + $dataDisk = "RESOURCEIDOFTHEDISKTOBEREPLICATED" + + #setdiskconfig + $diskconfig = New-AzRecoveryServicesAsrAzureToAzureDiskReplicationConfig ` + -ManagedDisk ` + -DiskId $dataDisk ` + -LogStorageAccountId $primaryStorageAccount ` + -RecoveryResourceGroupId $RecoveryResourceGroup ` + -RecoveryReplicaDiskAccountType Standard_LRS ` + -RecoveryTargetDiskAccountType Standard_LRS ` + -RecoveryDiskEncryptionSetId $diskencryptionset + + + #get fabric object from the source region. + $fabric = Get-AzRecoveryServicesAsrFabric + #use to fabric name to get the container. + $primaryContainerName =Get-AzRecoveryServicesAsrProtectionContainer -Fabric $fabric[1] + + #get the context of the protected item + $protectedItemObject = Get-AsrReplicationProtectedItem -ProtectionContainer $primaryContainerName \| where { $_.FriendlyName -eq $VMName };$protectedItemObject + + #initiate enable replication using below command + $protectedItemObject \|Add-AzRecoveryServicesAsrReplicationProtectedItemDisk -AzureToAzureDiskReplicationConfiguration $diskconfig + ``` - Addition of a new CMK enabled disk to an existing replicated item is not supported. Disable the replication and enable the replication again for the virtual machine. -* I have enabled both platform and customer managed keys, how can I protect my disks? +* I have enabled both platform and customer managed keys, how can I protect my disks? Enabling double encryption with both platform and customer managed keys is supported by Site Recovery. Follow the instructions in this article to protect your machine. You need to create a double encryption enabled DES in the target region in advance. At the time of enabling the replication for such a VM, you can provide this DES to Site Recovery.
site-recovery	Azure To Azure How To Enable Zone To Zone Disaster Recovery	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/site-recovery/azure-to-azure-how-to-enable-zone-to-zone-disaster-recovery.md	Previously updated : 07/27/2023 Last updated : 10/09/2023 This article describes how to replicate, failover, and failback Azure virtual machines from one Availability Zone to another, within the same Azure region. -Site Recovery service contributes to your business continuity and disaster recovery strategy by keeping your business apps up and running, during planned and unplanned outages. It is the recommended Disaster Recovery option to keep your applications up and running if there are regional outages. +Site Recovery service contributes to your business continuity and disaster recovery strategy by keeping your business apps up and running, during planned and unplanned outages. It's the recommended Disaster Recovery option to keep your applications up and running if there are regional outages. Availability Zones are unique physical locations within an Azure region. Each zone has one or more datacenters. Support for Zone to Zone disaster recovery is currently limited to the following \| Brazil South \| Poland Central \| \| \| \| \| \| Italy North \| \| \| \| -Site Recovery does not move or store customer data out of the region in which it's deployed when the customer is using Zone to Zone Disaster Recovery. Customers may select a Recovery Services Vault from a different region if they so choose. The Recovery Services Vault contains metadata but no actual customer data. +Site Recovery doesn't move or store customer data out of the region in which it's deployed when the customer is using Zone to Zone Disaster Recovery. Customers can select a Recovery Services Vault from a different region if they so choose. The Recovery Services Vault contains metadata but no actual customer data. >[!Note] ->Zone to Zone disaster recovery is not supported for VMs having ZRS managed disks. +>Zone to Zone disaster recovery isn't supported for VMs having ZRS managed disks. ## Using Availability Zones for Disaster Recovery -Typically, Availability Zones are used to deploy VMs in a High Availability configuration. They may be too close to each other to serve as a Disaster Recovery solution in natural disaster. +Typically, Availability Zones are used to deploy VMs in a High Availability configuration. They could be too close to each other to serve as a Disaster Recovery solution in natural disaster. -However, in some scenarios, Availability Zones can be leveraged for Disaster Recovery: +However, in some scenarios, Availability Zones can be used for Disaster Recovery: -- Many customers who had a metro Disaster Recovery strategy while hosting applications on-premises sometimes look to mimic this strategy once they migrate applications over to Azure. These customers acknowledge the fact that metro Disaster Recovery strategy may not work in a large-scale physical disaster and accept this risk. For such customers, Zone to Zone Disaster Recovery can be used as a Disaster Recovery option.-- Many other customers have complicated networking infrastructure and don't wish to recreate it in a secondary region due to the associated cost and complexity. Zone to Zone Disaster Recovery reduces complexity as it leverages redundant networking concepts across Availability Zones making configuration simpler. Such customers prefer simplicity and can also use Availability Zones for Disaster Recovery.-- In some regions that don't have a paired region within the same legal jurisdiction (for example, Southeast Asia), Zone to Zone Disaster Recovery can serve as the de-facto Disaster Recovery solution as it helps ensure legal compliance, since your applications and data don't move across national boundaries. +- Many customers who had a metro Disaster Recovery strategy while hosting applications on-premises sometimes look to mimic this strategy once they migrate applications over to Azure. These customers acknowledge the fact that metro Disaster Recovery strategy might not work in a large-scale physical disaster and accept this risk. For such customers, Zone to Zone Disaster Recovery can be used as a Disaster Recovery option. +- Many other customers have complicated networking infrastructure and don't wish to recreate it in a secondary region due to the associated cost and complexity. Zone to Zone Disaster Recovery reduces complexity as it uses redundant networking concepts across Availability Zones making configuration simpler. Such customers prefer simplicity and can also use Availability Zones for Disaster Recovery. +- In some regions that don't have a paired region within the same legal jurisdiction (for example, Southeast Asia), Zone to Zone Disaster Recovery can serve as the defacto Disaster Recovery solution as it helps ensure legal compliance, since your applications and data don't move across national boundaries. -- Zone to Zone Disaster Recovery implies replication of data across shorter distances when compared with Azure to Azure Disaster Recovery and therefore, you may see lower latency and consequently lower RPO. +- Zone to Zone Disaster Recovery implies replication of data across shorter distances when compared with Azure to Azure Disaster Recovery and therefore, you can see lower latency and therefore lower RPO. -While these are strong advantages, there's a possibility that Zone to Zone Disaster Recovery may fall short of resilience requirements in the event of a region-wide natural disaster. +While these are strong advantages, there's a possibility that Zone to Zone Disaster Recovery can fall short of resilience requirements in the event of a region-wide natural disaster. ## Networking for Zone to Zone Disaster Recovery -As mentioned before, Zone to Zone Disaster Recovery reduces complexity as it leverages redundant networking concepts across Availability Zones making configuration simpler. The behavior of networking components in the Zone to Zone Disaster Recovery scenario is outlined below: +As mentioned before, Zone to Zone Disaster Recovery reduces complexity as it uses redundant networking concepts across Availability Zones making configuration simpler. The behavior of networking components in the Zone to Zone Disaster Recovery scenario is outlined as follows: -- Virtual Network: You may use the same virtual network as the source network for actual failovers. Use a different virtual network to the source virtual network for test failovers.-- Subnet: Failover into the same subnet is supported.-- Private IP address: If you're using static IP addresses, you can use the same IPs in the target zone if you choose to configure them in such a manner.-- Accelerated Networking: Similar to Azure to Azure Disaster Recovery, you may enable Accelerated Networking if the VM SKU supports it.-- Public IP address: You can attach a previously created standard public IP address in the same region to the target VM. Basic public IP addresses don't support Availability Zone related scenarios.-- Load balancer: Standard load balancer is a regional resource and therefore the target VM can be attached to the backend pool of the same load balancer. A new load balancer isn't required.-- Network Security Group: You may use the same network security groups as applied to the source VM. +- Virtual Network: You can use the same virtual network as the source network for actual failovers. Use a different virtual network to the source virtual network for test failovers. +- Subnet: Failover into the same subnet is supported. +- Private IP address: If you're using static IP addresses, you can use the same IPs in the target zone if you choose to configure them in such a manner. + However, for each VM protected by Azure Site Recovery for which you wish to use the same IP on target zone, you must have a free IP available in the subnet as Azure Site Recovery uses it during failover. Azure Site Recovery allocates this free IP address to the source VM to free up the target IP address. Azure Site Recovery then allocates the target IP address to the target VM. +- Accelerated Networking: Similar to Azure to Azure Disaster Recovery, you can enable Accelerated Networking if the VM SKU supports it. +- Public IP address: You can attach a previously created standard public IP address in the same region to the target VM. Basic public IP addresses don't support Availability Zone related scenarios. +- Load balancer: Standard load balancer is a regional resource and therefore the target VM can be attached to the backend pool of the same load balancer. A new load balancer isn't required. +- Network Security Group: You can use the same network security groups as applied to the source VM. -## Pre-requisites +## Prerequisites -Before deploying Zone to Zone Disaster Recovery for your VMs, it is important to ensure that other features enabled on the VM are interoperable with zone to zone disaster recovery. +Before deploying Zone to Zone Disaster Recovery for your VMs, it's important to ensure that other features enabled on the VM are interoperable with zone to zone disaster recovery. \|Feature \| Support statement \| \|\|\| Before deploying Zone to Zone Disaster Recovery for your VMs, it is important to \|Managed disks \| Supported \| \|Customer-managed keys \| Supported \| \|Proximity placement groups \| Supported \| -\|Backup interoperability \| File level backup and restore are supported. Disk and VM level backup and restore are not supported. \| -\|Hot add/remove \| Disks can be added after enabling zone to zone replication. Removal of disks after enabling zone to zone replication is not supported. \| +\|Backup interoperability \| File level backup and restore are supported. Disk and VM level backup and restore aren't supported. \| +\|Hot add/remove \| Disks can be added after enabling zone to zone replication. Removal of disks after enabling zone to zone replication isn't supported. \| ## Set up Site Recovery Zone to Zone Disaster Recovery Log in to the Azure portal. :::image type="Basic Settings page" source="./media/azure-to-azure-how-to-enable-zone-to-zone-disaster-recovery/zonal-disaster-recovery-basic-settings.png" alt-text="Screenshot of Basic Settings page."::: -1. If you accept all defaults, click ΓÇÿReview + Start replicationΓÇÖ followed by ΓÇÿStart replicationΓÇÖ. -1. If you want to make changes to the replication settings, click on ΓÇÿNext: Advanced settingsΓÇÖ. +1. If you accept all defaults, select Review + Start replication followed by Start replication. +1. If you want to make changes to the replication settings, click on Next: Advanced settings. 1. Change the settings away from default wherever appropriate. For users of Azure to Azure Disaster Recovery, this page might seem familiar. More details on the options presented on this blade can be found [here](./azure-to-azure-tutorial-enable-replication.md) :::image type="Advanced Settings page" source="./media/azure-to-azure-how-to-enable-zone-to-zone-disaster-recovery/zonal-disaster-recovery-advanced-settings.png" alt-text="Screenshot of Advanced Settings page."::: -1. Click on ΓÇÿNext: Review + Start replicationΓÇÖ and then ΓÇÿStart replicationΓÇÖ. +1. Select Next: Review + Start replication and then Start replication. ## FAQs 1. How does pricing work for Zone to Zone Disaster Recovery? -Pricing for Zone to Zone Disaster Recovery is identical to the pricing of Azure to Azure Disaster Recovery. You can find more details on the pricing page [here](https://azure.microsoft.com/pricing/details/site-recovery/) and [here](https://azure.microsoft.com/blog/know-exactly-how-much-it-will-cost-for-enabling-dr-to-your-azure-vm/). Note that the egress charges that you would see in zone to zone disaster recovery would be lower than region to region disaster recovery. For data transfer charges between Availability Zones, check [here](https://azure.microsoft.com/pricing/details/bandwidth/). +Pricing for Zone to Zone Disaster Recovery is identical to the pricing of Azure to Azure Disaster Recovery. You can find more details on the pricing page [here](https://azure.microsoft.com/pricing/details/site-recovery/) and [here](https://azure.microsoft.com/blog/know-exactly-how-much-it-will-cost-for-enabling-dr-to-your-azure-vm/). The egress charges that you would see in zone to zone disaster recovery would be lower than region to region disaster recovery. For data transfer charges between Availability Zones, check [here](https://azure.microsoft.com/pricing/details/bandwidth/). 2. What is the SLA for RTO and RPO? The RTO SLA is the same as that for Site Recovery overall. We promise RTO of up to 2 hours. There's no defined SLA for RPO. No, you must fail over to a different resource group. ## Next steps -The steps that need to be followed to run a Disaster Recovery drill, fail over, re-protect, and failback are the same as the steps in Azure to Azure Disaster Recovery scenario. +The steps that need to be followed to run a Disaster Recovery drill, fail over, reprotect, and failback are the same as the steps in Azure to Azure Disaster Recovery scenario. To perform a Disaster Recovery drill, follow the steps outlined [here](./azure-to-azure-tutorial-dr-drill.md).
site-recovery	Failover Failback Overview Modernized	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/site-recovery/failover-failback-overview-modernized.md	Title: About failover and failback in Azure Site Recovery - Modernized description: Learn about failover and failback in Azure Site Recovery - Modernized Previously updated : 08/01/2023 Last updated : 10/06/2023 Site Recovery provides different failover options. Failover \| Details \| Recovery \| Workflow \| \| \| Test failover \| Used to run a drill that validates your BCDR strategy, without any data loss or downtime.\| Creates a copy of the VM in Azure, with no impact on ongoing replication, or on your production environment. \| 1. Run a test failover on a single VM, or on multiple VMs in a recovery plan.<br/><br/> 2. Select a recovery point to use for the test failover.<br/><br/> 3. Select an Azure network in which the Azure VM will be located when it's created after failover. The network is only used for the test failover.<br/><br/> 4. Verify that the drill worked as expected. Site Recovery automatically cleans up VMs created in Azure during the drill. -Planned failover-Hyper-V \| Usually used for planned downtime.<br/><br/> Source VMs are shut down. The latest data is synchronized before initiating the failover. \| Zero data loss for the planned workflow. \| 1. Plan a downtime maintenance window and notify users.<br/><br/> 2. Take user-facing apps offline.<br/><br/> 3. Initiate a planned failover with the latest recovery point. The failover doesn't run if the machine isn't shut down, or if errors are encountered.<br/><br/> 4. After the failover, check that the replica Azure VM is active in Azure.<br/><br/> 5. Commit the failover to finish up. The commit action deletes all recovery points. -Failover-Hyper-V \| Usually run if there's an unplanned outage, or the primary site isn't available.<br/><br/> Optionally shut down the VM and synchronize final changes before initiating the failover. \| Minimal data loss for apps. \| 1. Initiate your BCDR plan. <br/><br/> 2. Initiate a failover. Specify whether Site Recovery should shut down the VM and synchronize/replicate the latest changes before triggering the failover.<br/><br/> 3. You can failover to a number of recovery point options, summarized in the table below.<br/><br/> If you don't enable the option to shut down the VM, or if Site Recovery can't shut it down, the latest recovery point is used.<br/>The failover runs even if the machine can't be shut down.<br/><br/> 4. After failover, you check that the replica Azure VM is active in Azure.<br/> If required, you can select a different recovery point from the retention window of 24 hours.<br/><br/> 5. Commit the failover to finish up. The commit action deletes all available recovery points. -Failover-VMware \| Usually run if there's an unplanned outage, or the primary site isn't available.<br/><br/> Optionally specify that Site Recovery should try to trigger a shutdown of the VM, and to synchronize and replicate final changes before initiating the failover. \| Minimal data loss for apps. \| 1. Initiate your BCDR plan. <br/><br/> 2. Initiate a failover from Site Recovery. Specify whether Site Recovery should try to trigger VM shutdown and synchronize before running the failover.<br/> The failover runs even if the machines can't be shut down.<br/><br/> 3. After the failover, check that the replica Azure VM is active in Azure. <br/>If required, you can select a different recovery point from the retention window of 72 hours.<br/><br/> 5. Commit the failover to finish up. The commit action deletes all recovery points.<br/> For Windows VMs, Site Recovery disables the VMware tools during failover. +Planned failover-Hyper-V \| Used for planned downtime.<br/><br/> Source VMs are shut down. The latest data is synchronized before initiating the failover. \| Zero data loss for the planned workflow. \| 1. Plan a downtime maintenance window and notify users.<br/><br/> 2. Take user-facing apps offline.<br/><br/> 3. Initiate a planned failover with the latest recovery point. The failover doesn't run if the machine isn't shut down, or if errors are encountered.<br/><br/> 4. After the failover, check that the replica Azure VM is active in Azure.<br/><br/> 5. Commit the failover to finish up. The commit action deletes all recovery points. +Failover-Hyper-V \| Usually run if there's an unplanned outage, or the primary site isn't available.<br/><br/> Optionally shut down the VM and synchronize final changes before initiating the failover. \| Minimal data loss for apps. \| 1. Initiate your BCDR plan. <br/><br/> 2. Initiate a failover. Specify whether Site Recovery should shut down the VM and synchronize/replicate the latest changes before triggering the failover.<br/><br/> 3. You can failover to many recovery point options, summarized [here](#recovery-point-options).<br/><br/> If you don't enable the option to shut down the VM, or if Site Recovery can't shut it down, the latest recovery point is used.<br/>The failover runs even if the machine can't be shut down.<br/><br/> 4. After failover, you check that the replica Azure VM is active in Azure.<br/> If necessary, you can select a different recovery point from the retention window of 24 hours.<br/><br/> 5. Commit the failover to finish up. The commit action deletes all available recovery points. +Failover-VMware \| Usually run if there's an unplanned outage, or the primary site isn't available.<br/><br/> Optionally specify that Site Recovery should try to trigger a shutdown of the VM, and to synchronize and replicate final changes before initiating the failover. \| Minimal data loss for apps. \| 1. Initiate your BCDR plan. <br/><br/> 2. Initiate a failover from Site Recovery. Specify whether Site Recovery should try to trigger VM shutdown and synchronize before running the failover.<br/> The failover runs even if the machines can't be shut down.<br/><br/> 3. After the failover, check that the replica Azure VM is active in Azure. <br/>If necessary, you can select a different recovery point from the retention window of 72 hours.<br/><br/> 5. Commit the failover to finish up. The commit action deletes all recovery points.<br/> For Windows VMs, Site Recovery disables the VMware tools during failover. Planned failover-VMware \| You can perform a planned failover from Azure to on-premises. \| Since it is a planned failover activity, the recovery point is generated after the planned failover job is triggered. \| When the planned failover is triggered, pending changes are copied to on-premises, a latest recovery point of the VM is generated and Azure VM is shut down.<br/><br/> Follow the failover process as discussed [here](vmware-azure-tutorial-failover-failback-modernized.md#planned-failover-from-azure-to-on-premises). Post this, on-premises machine is turned on. After a successful planned failover, the machine will be active in your on-premises environment. ## Failover processing In some scenarios, failover requires additional processing that takes around 8 t ## Recovery point options -During failover, you can select a number of recovery point options. +During failover, you can select many recovery point options. Option \| Details \| -Latest (lowest RPO) \| This option provides the lowest recovery point objective (RPO). It first processes all the data that has been sent to Site Recovery service, to create a recovery point for each VM, before failing over to it. This recovery point has all the data replicated to Site Recovery when the failover was triggered. +Latest (lowest RPO) \| This option provides the lowest recovery point objective (RPO). It first processes all the data that has been sent to Site Recovery service, to create a recovery point for each VM, before failing over to it. It initially attempts to process and apply all data sent to Site Recovery service in the target location and create a recovery point using the processed data. However, if at the time failover was triggered, there is no data uploaded to Site Recovery service waiting to be processed, Azure Site Recovery won't perform any processing and hence, won't create a new recovery point. In this scenario, it will instead failover using the previously processed recovery point only. Latest processed \| This option fails over VMs to the latest recovery point processed by Site Recovery. To see the latest recovery point for a specific VM, check Latest Recovery Points in the VM settings. This option provides a low RTO (Recovery Time Objective), because no time is spent processing unprocessed data. Latest app-consistent \| This option fails over VMs to the latest application-consistent recovery point processed by Site Recovery if app-consistent recovery points are enabled. Check the latest recovery point in the VM settings. Latest multi-VM processed \| This option is available for recovery plans with one or more VMs that have multi-VM consistency enabled. VMs with the setting enabled failover to the latest common multi-VM consistent recovery point. Any other VMs in the plan failover to the latest processed recovery point. After failover to Azure, the replicated Azure VMs are in an unprotected state. - As a first step to failing back to your on-premises site, you need to start the Azure VMs replicating to on-premises. The reprotection process depends on the type of machines you failed over. - After machines are replicating from Azure to on-premises, you can run a failover from Azure to your on-premises site. - After machines are running on-premises again, you can enable replication so that they replicate to Azure for disaster recovery.-- Only disks replicated from on-premises to Azure will be replicated back from Azure during re-protect operation. Newly added disks to failed over Azure VM will not be replicated to on-premises machine. +- Only disks replicated from on-premises to Azure are replicated back from Azure during re-protect operation. Newly added disks to failed over Azure VM will not be replicated to on-premises machine. - An appliance can have up to 60 disks attached to it. If the VMs being failed back have more than a collective total of 60 disks, or if you're failing back large volumes of traffic, create a separate appliance for failback. Planned failover works as follows: - To fail back to on-premises, a VM needs at least one recovery point in order to fail back. In a recovery plan, all VMs in the plan need at least one recovery point.-- As this is a planned failover activity, you will be allowed to select the type of recovery point you want to fail back to. We recommend that you use a crash-consistent point. +- As this is a planned failover activity, you are allowed to select the type of recovery point you want to fail back to. We recommend that you use a crash-consistent point. - There is also an app-consistent recovery point option. In this case, a single VM recovers to its latest available app-consistent recovery point. For a recovery plan with a replication group, each replication group recovers to its common available recovery point. - App-consistent recovery points can be behind in time, and there might be loss in data. - During failover from Azure to the on-premises site, Site Recovery shuts down the Azure VMs. When you commit the failover, Site Recovery removes the failed back Azure VMs in Azure. To reprotect and fail back VMware machines and physical servers from Azure to on - You can select any of the Azure Site Recovery replication appliances registered under a vault to re-protect to on-premises. You do not require a separate Process server in Azure for re-protect operation and a scale-out Master Target server for Linux VMs. - Replication appliance doesnΓÇÖt require additional network connection/ports (as compared with forward protection) during failback. Same appliance can be used for forward and backward protections if it is in healthy state. It should not impact the performance of the replications.-- When selecting the appliance, ensure that the target datastore where the source machine is located, is accessible by the appliance. The datastore of the source machine should always be accessible by the appliance. Even if the machine and appliance are located in different ESX servers, as long as the data store is shared between them, reprotection will succeed. +- When selecting the appliance, ensure that the target datastore where the source machine is located, is accessible by the appliance. The datastore of the source machine should always be accessible by the appliance. Even if the machine and appliance are located in different ESX servers, as long as the data store is shared between them, reprotection succeeds. > [!NOTE] > - Storage vMotion of replication appliance is not supported after re-protect operation. > - When selecting the appliance, ensure that the target datastore where the source machine is located, is accessible by the appliance. To reprotect and fail back VMware machines and physical servers from Azure to on Re-protect job -- If this is a new re-protect operation, then by default, a new log storage account will be automatically created by Azure Site Recovery in the target region. Retention disk is not required.-- In case of Alternate Location Recovery and Original Location Recovery, the original configurations of source machines will be retrieved. +- If this is a new re-protect operation, then by default, a new log storage account is automatically created by Azure Site Recovery in the target region. Retention disk is not required. +- In case of Alternate Location Recovery and Original Location Recovery, the original configurations of source machines are retrieved. > [!NOTE] > - Static IP address canΓÇÖt be retained in case of Alternate location re-protect (ALR) or Original location re-protect (OLR). > - fstab, LVMconf would be changed. To reprotect and fail back VMware machines and physical servers from Azure to on - Any failed re-protect job can be retried. During retry, you can choose any healthy replication appliance. -When you reprotect Azure machines to on-premises, you will be notified that you are failing back to the original location, or to an alternate location. +When you reprotect Azure machines to on-premises, you are notified that you are failing back to the original location, or to an alternate location. - Original location recovery: This fails back from Azure to the same source on-premises machine if it exists. In this scenario, only changes are replicated back to on-premises. - - Data store selection during OLR: The data store attached to the source machine will be automatically selected. + - Data store selection during OLR: The data store attached to the source machine is automatically selected. - Alternate location recovery: If the on-premises machine doesn't exist, you can fail back from Azure to an alternate location. When you reprotect the Azure VM to on-premises, the on-premises machine is created. Full data replication occurs from Azure to on-premises. [Review](concepts-types-of-failback.md) the requirements and limitations for location failback. - Data store selection during ALR: Any data store managed by vCenter on which the appliance is situated and is accessible (read and write permissions) by the appliance can be chosen (original/new). You can choose cache storage account used for re-protection. -- After failover is complete, mobility agent in the Azure VM will be registered with Site Recovery Services automatically. If registration fails, a critical health issue will be raised on the failed over VM. After issue is resolved, registration will be automatically triggered. You can manually complete the registration after resolving the errors. +- After failover is complete, mobility agent in the Azure VM is registered with Site Recovery Services automatically. If registration fails, a critical health issue will be raised on the failed over VM. After issue is resolved, registration is automatically triggered. You can manually complete the registration after resolving the errors. ## Cancel failover
spring-apps	Access App Virtual Network	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/spring-apps/access-app-virtual-network.md	ms.devlang: azurecli This article explains how to access an endpoint for your application in a private network. -When you assign an endpoint on an application in an Azure Spring Apps service instance deployed in your virtual network, the endpoint uses a private fully qualified domain name (FQDN). The domain is only accessible in the private network. Apps and services use the application endpoint. They include the Test Endpoint described in [View apps and deployments](./how-to-staging-environment.md#view-apps-and-deployments). Log streaming, described in [Stream Azure Spring Apps app logs in real-time](./how-to-log-streaming.md), also works only within the private network. +When you assign an endpoint on an application in an Azure Spring Apps service instance deployed in your virtual network, the endpoint uses a private fully qualified domain name (FQDN). The domain is only accessible in the private network. Apps and services use the application endpoint. They include the Test Endpoint described in the [View apps and deployments](./how-to-staging-environment.md#view-apps-and-deployments) section of [Set up a staging environment in Azure Spring Apps](./how-to-staging-environment.md). Log streaming, described in [Stream Azure Spring Apps app logs in real-time](./how-to-log-streaming.md), also works only within the private network. + +## Prerequisites + +- An Azure subscription. If you don't have a subscription, create a [free account](https://azure.microsoft.com/free/) before you begin. +- (Optional) [Azure CLI](/cli/azure/install-azure-cli) version 2.45.0 or higher. +- An existing application in an Azure Spring Apps service instance deployed to a virtual network. For more information, see [Deploy Azure Spring Apps in a virtual network](./how-to-deploy-in-azure-virtual-network.md). ## Find the IP for your application -#### [Azure portal](#tab/azure-portal) +### [Azure portal](#tab/azure-portal) + +Use the following steps to find the IP address for your application. 1. Go to the Azure Spring Apps service Networking page. When you assign an endpoint on an application in an Azure Spring Apps service in :::image type="content" source="media/spring-cloud-access-app-vnet/find-ip-address.png" alt-text="Screenshot of the Azure portal that shows the Vnet injection Endpoint information." lightbox="media/spring-cloud-access-app-vnet/find-ip-address.png"::: -#### [Azure CLI](#tab/azure-CLI) +### [Azure CLI](#tab/azure-CLI) -Find the IP Address for your Spring Cloud services. Customize the value of your Azure Spring Apps instance name based on your real environment. +Use the following steps to initialize the local environment and find the IP address for your application. -```azurecli -export SPRING_CLOUD_NAME='spring-cloud-name' -export SERVICE_RUNTIME_RG=$(az spring show \ - --resource-group $RESOURCE_GROUP \ - --name $SPRING_CLOUD_NAME \ - --query "properties.networkProfile.serviceRuntimeNetworkResourceGroup" \ - --output tsv) -export IP_ADDRESS=$(az network lb frontend-ip list \ - --lb-name kubernetes-internal \ - --resource-group $SERVICE_RUNTIME_RG \ - --query "[0].privateIPAddress" \ - --output tsv) -``` +1. Use the following commands to define various environment variables. Be sure to replace the placeholders with your actual values. - + ```azurecli + export SUBSCRIPTION='<subscription-ID>' + export RESOURCE_GROUP='<resource-group-name>' + export AZURE_SPRING_APPS_INSTANCE_NAME='<Azure-Spring-Apps-instance-name>' + export VIRTUAL_NETWORK_NAME='<Azure-Spring-Apps-virtual-network-name>' + export APP_NAME='<application-name>' + ``` -## Add a DNS for the IP +1. Use the following commands to sign in to the Azure CLI and choose your active subscription: -If you have your own DNS solution for your virtual network, like Active Directory Domain Controller, Infoblox, or another, you need to point the domain `.private.azuremicroservices.io` to the [IP address](#find-the-ip-for-your-application). Otherwise, you can follow the following instructions to create an Azure Private DNS Zone* in your subscription to translate/resolve the private fully qualified domain name (FQDN) to its IP address. + ```azurecli + az login + az account set --subscription ${SUBSCRIPTION} + ``` -> [!NOTE] -> If you're using Microsoft Azure operated by 21Vianet, be sure to replace `private.azuremicroservices.io` with `private.microservices.azure.cn` in this article. Learn more about [Check Endpoints in Azure](/azure/china/resources-developer-guide#check-endpoints-in-azure). +1. Use the following commands to find the IP address for your application. -## Create a private DNS zone + ```azurecli + export SERVICE_RUNTIME_RG=$(az spring show \ + --resource-group $RESOURCE_GROUP \ + --name $AZURE_SPRING_APPS_INSTANCE_NAME \ + --query "properties.networkProfile.serviceRuntimeNetworkResourceGroup" \ + --output tsv) + export IP_ADDRESS=$(az network lb frontend-ip list \ + --lb-name kubernetes-internal \ + --resource-group $SERVICE_RUNTIME_RG \ + --query "[0].privateIPAddress" \ + --output tsv) + echo $IP_ADDRESS + ``` ++ -The following procedure creates a private DNS zone for an application in the private network. +## Add a DNS for the IP -#### [Azure portal](#tab/azure-portal) +If you have your own DNS solution for your virtual network, like Active Directory Domain Controller, Infoblox, or another, you need to point the domain `.private.azuremicroservices.io` to the [IP address](#find-the-ip-for-your-application). Otherwise, use the following instructions to create an Azure Private DNS Zone* in your subscription to translate/resolve the private fully qualified domain name (FQDN) to its IP address. -1. Open the Azure portal. From the top search box, search for Private DNS zones, and select Private DNS zones from the results. +> [!NOTE] +> If you're using Microsoft Azure operated by 21Vianet, be sure to replace `private.azuremicroservices.io` with `private.microservices.azure.cn` in this article. For more information, see the [Check Endpoints in Azure](/azure/china/resources-developer-guide#check-endpoints-in-azure) section of the [Azure China developer guide](/azure/china/resources-developer-guide). -2. On the Private DNS zones page, select Add. +## Create a private DNS zone -3. Fill out the form on the Create Private DNS zone page. Enter private.azuremicroservices.io as the Name of the zone. +### [Azure portal](#tab/azure-portal) -4. Select Review + Create. +Use the following steps to create a private DNS zone for an application in the private network: -5. Select Create. +1. Open the Azure portal. Using the search box, search for Private DNS zones. Select Private DNS zones from the search results. -#### [Azure CLI](#tab/azure-CLI) +1. On the Private DNS zones page, select Add. -1. Define variables for your subscription, resource group, and Azure Spring Apps instance. Customize the values based on your real environment. +1. Fill out the form on the Create Private DNS zone page. Enter private.azuremicroservices.io as the Name of the zone. - ```azurecli - export SUBSCRIPTION='subscription-id' - export RESOURCE_GROUP='my-resource-group' - export VIRTUAL_NETWORK_NAME='azure-spring-apps-vnet' - ``` +1. Select Review + Create. -1. Sign in to the Azure CLI and choose your active subscription. +1. Select Create. - ```azurecli - az login - az account set --subscription ${SUBSCRIPTION} - ``` +### [Azure CLI](#tab/azure-CLI) -1. Create the private DNS zone. +Use the following command to create the private DNS zone: - ```azurecli - az network private-dns zone create \ - --resource-group $RESOURCE_GROUP \ - --name private.azuremicroservices.io - ``` +```azurecli +az network private-dns zone create \ + --resource-group $RESOURCE_GROUP \ + --name private.azuremicroservices.io +``` -It may take a few minutes to create the zone. +It might take a few minutes to create the zone. ## Link the virtual network To link the private DNS zone to the virtual network, you need to create a virtual network link. -#### [Azure portal](#tab/azure-portal) +### [Azure portal](#tab/azure-portal) + +Use the following steps to link the private DNS zone you created to the virtual network holding your Azure Spring Apps service: -1. Select the private DNS zone resource you created previously: private.azuremicroservices.io +1. Select the private DNS zone resource you created - for example, private.azuremicroservices.io. -2. On the left pane, select Virtual network links, then select Add. +1. Select Virtual network links, and then select Add. -4. Enter azure-spring-apps-dns-link for the Link name. +1. For Link name, enter azure-spring-apps-dns-link. -5. For Virtual network, select the virtual network you created as explained in [Deploy Azure Spring Apps in your Azure virtual network (VNet injection)](./how-to-deploy-in-azure-virtual-network.md). +1. For Virtual network, select the virtual network you created previously. - ![Add virtual network link](media/spring-cloud-access-app-vnet/add-virtual-network-link.png) + :::image type="content" source="media/spring-cloud-access-app-vnet/add-virtual-network-link.png" alt-text="Screenshot of the Azure portal that shows the Add virtual network link page." lightbox="media/spring-cloud-access-app-vnet/add-virtual-network-link.png"::: -6. Select OK. +1. Select OK. -#### [Azure CLI](#tab/azure-CLI) +### [Azure CLI](#tab/azure-CLI) -Link the private DNS zone you created to the virtual network holding your Azure Spring Apps service. +Use the following command to link the private DNS zone you created to the virtual network holding your Azure Spring Apps service: ```azurecli az network private-dns link vnet create \ az network private-dns link vnet create \ ## Create DNS record -To use the private DNS zone to translate/resolve DNS, you must create an "A" type record in the zone. +You must create an "A" type record in the private DNS zone. + +### [Azure portal](#tab/azure-portal) -#### [Azure portal](#tab/azure-portal) +Use the following steps to use the private DNS zone to translate/resolve DNS. -1. Select the private DNS zone resource you created previously: private.azuremicroservices.io. +1. Select the private DNS zone resource you created - for example, private.azuremicroservices.io. 1. Select Record set. -1. In Add record set, enter or select this information: +1. In Add record set, enter or select the following information: - \|Setting \|Value \| - \|\|\| - \|Name \|Enter \* \| - \|Type \|Select A \| - \|TTL \|Enter 1 \| - \|TTL unit \|Select Hours \| - \|IP address \|Enter the IP address copied in step 3. In the sample, the IP is 10.1.0.7. \| + \| Setting \| Value \| + \|\|\| + \| Name \| Enter \. \| + \| Type \| Select A. \| + \| TTL \| Enter 1. \| + \| TTL unit \| Select Hours. \| + \| IP address \| Enter the [IP address](#find-the-ip-for-your-application). The following screenshot uses the IP address 10.1.0.7. \| -1. Select OK. + :::image type="content" source="media/spring-cloud-access-app-vnet/private-dns-zone-add-record.png" alt-text="Screenshot of the Azure portal that shows the Add record set page." lightbox="media/spring-cloud-access-app-vnet/private-dns-zone-add-record.png"::: - ![Add private DNS zone record](media/spring-cloud-access-app-vnet/private-dns-zone-add-record.png) +1. Select OK. -#### [Azure CLI](#tab/azure-CLI) +### [Azure CLI](#tab/azure-CLI) -Use the [IP address](#find-the-ip-for-your-application) to create the A record in your DNS zone. +Use the following commands to create the A record in your DNS zone: ```azurecli az network private-dns record-set a add-record \ az network private-dns record-set a add-record \ -## Assign private FQDN for your application +## Assign a private FQDN for your application + +You can assign a private FQDN for your application after you deploy Azure Spring Apps in a virtual network. For more information, see [Deploy Azure Spring Apps in a virtual network](./how-to-deploy-in-azure-virtual-network.md). -After following the procedure in [Deploy Azure Spring Apps in a virtual network](./how-to-deploy-in-azure-virtual-network.md), you can assign a private FQDN for your application. +### [Azure portal](#tab/azure-portal) -#### [Azure portal](#tab/azure-portal) +Use the following steps to assign a private FQDN: -1. Select the Azure Spring Apps service instance deployed in your virtual network, and open the Apps* tab in the menu on the left. +1. Select the Azure Spring Apps service instance deployed in your virtual network, and open the Apps tab. -2. Select the application to show the Overview page. +1. Select the application to open the Overview page. -3. Select Assign Endpoint to assign a private FQDN to your application. Assigning an FQDN can take a few minutes. +1. Select Assign Endpoint to assign a private FQDN to your application. Assigning an FQDN can take a few minutes. - ![Assign private endpoint](media/spring-cloud-access-app-vnet/assign-private-endpoint.png) + :::image type="content" source="media/spring-cloud-access-app-vnet/assign-private-endpoint.png" alt-text="Screenshot of the Azure portal that shows the Overview page with Assign endpoint highlighted." lightbox="media/spring-cloud-access-app-vnet/assign-private-endpoint.png"::: -4. The assigned private FQDN (labeled URL) is now available. It can only be accessed within the private network, but not on the Internet. +1. The assigned private FQDN (labeled URL) is now available. You can only access the URL within the private network, but not on the internet. -#### [Azure CLI](#tab/azure-CLI) +### [Azure CLI](#tab/azure-CLI) -Update your app to assign an endpoint to it. Customize the value of your app name based on your real environment. +Use the following command to assign an endpoint to your application: ```azurecli -export SPRING_CLOUD_APP='your spring cloud app' az spring app update \ --resource-group $RESOURCE_GROUP \ - --name $SPRING_CLOUD_APP \ - --service $SPRING_CLOUD_NAME \ + --name $APP_NAME \ + --service $AZURE_SPRING_APPS_INSTANCE_NAME \ --assign-endpoint true ``` -## Access application private FQDN +## Access the application's private FQDN -After the assignment, you can access the application's private FQDN in the private network. For example, you can create a jumpbox machine in the same virtual network, or a peered virtual network. Then, on that jumpbox or virtual machine, the private FQDN is accessible. +After the assignment, you can access the application's private FQDN in the private network. For example, you can create a jumpbox machine in the same virtual network or in a peered virtual network. Then, on that jumpbox or virtual machine, you can access the private FQDN. -![Access private endpoint in vnet](media/spring-cloud-access-app-vnet/access-private-endpoint.png) ## Clean up resources -If you plan to continue working with subsequent articles, you might want to leave these resources in place. When no longer needed, delete the resource group, which deletes the resources in the resource group. To delete the resource group by using Azure CLI, use the following command: +If you plan to continue working with subsequent articles, you might want to leave these resources in place. When you no longer need them, delete the resource group, which deletes the resources in the resource group. To delete the resource group by using the Azure CLI, use the following command: ```azurecli az group delete --name $RESOURCE_GROUP az group delete --name $RESOURCE_GROUP ## Next steps - [Expose applications with end-to-end TLS in a virtual network](./expose-apps-gateway-end-to-end-tls.md)-- [Troubleshooting Azure Spring Apps in VNET](./troubleshooting-vnet.md)-- [Customer Responsibilities for Running Azure Spring Apps in VNET](./vnet-customer-responsibilities.md) +- [Troubleshooting Azure Spring Apps in virtual networks](./troubleshooting-vnet.md) +- [Customer responsibilities for running Azure Spring Apps in a virtual network](./vnet-customer-responsibilities.md)
spring-apps	Quickstart Integrate Azure Database And Redis Enterprise	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/spring-apps/quickstart-integrate-azure-database-and-redis-enterprise.md	The following steps show how to bind applications running in the Azure Spring Ap --app cart-service \ --connection cart_service_cache \| jq -r '.configurations[0].value') + export GATEWAY_URL=$(az spring gateway show \ + --resource-group <resource-group-name> \ + --service <Azure-Spring-Apps-service-instance-name> \| jq -r '.properties.url') + az spring app update \ --resource-group <resource-group-name> \ --name cart-service \ --service <Azure-Spring-Apps-service-instance-name> \ - --env "CART_PORT=8080" "REDIS_CONNECTIONSTRING=${REDIS_CONN_STR}" + --env "CART_PORT=8080" "REDIS_CONNECTIONSTRING=${REDIS_CONN_STR}" "AUTH_URL=https://${GATEWAY_URL}" ``` ## Access the application
storage	File Sync Monitoring	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/file-sync/file-sync-monitoring.md	description: Review how to monitor your Azure File Sync deployment by using Azur Previously updated : 01/3/2022 Last updated : 10/09/2023 To view the health of your Azure File Sync deployment in the Azure portal, n - Registered server health - Server endpoint health - - Files not syncing - - Sync activity - - Cloud tiering efficiency - - Files not tiering + - Persistent sync errors + - Transient sync errors + - Sync activity (Upload to cloud, Download to server) + - Cloud tiering space savings + - Tiering errors - Recall errors - Metrics To view the registered server health in the portal, navigate to the Regist To view the health of a server endpoint in the portal, navigate to the Sync groups section of the Storage Sync Service and select a sync group. -![Screenshot of server endpoint health](media/storage-sync-files-troubleshoot/file-sync-server-endpoint-health.png) -- The server endpoint health and sync activity in the portal is based on the sync events that are logged in the Telemetry event log on the server (ID 9102 and 9302). If a sync session fails because of a transient error, such as error canceled, the server endpoint will still show as healthy in the portal as long as the current sync session is making progress (files are applied). Event ID 9302 is the sync progress event and Event ID 9102 is logged once a sync session completes. For more information, see [sync health](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#broken-sync) and [sync progress](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#how-do-i-monitor-the-progress-of-a-current-sync-session). If the server endpoint health shows an Error or No Activity, see the [troubleshooting documentation](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#common-sync-errors) for guidance.-- The files not syncing count in the portal is based on the Event ID 9121 that is logged in the Telemetry event log on the server. This event is logged for each per-item error once the sync session completes. To resolve per-item errors, see [How do I see if there are specific files or folders that are not syncing?](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#how-do-i-see-if-there-are-specific-files-or-folders-that-are-not-syncing).-- To view the cloud tiering efficiency in the portal, go to the Server Endpoint Properties and navigate to the Cloud Tiering section. The data provided for cloud tiering efficiency is based on Event ID 9071 that is logged in the Telemetry event log on the server. To learn more, see [Monitor cloud tiering](file-sync-monitor-cloud-tiering.md).-- To view files not tiering and recall errors in the portal, go to the Server Endpoint Properties and navigate to the Cloud Tiering section. Files not tiering is based on Event ID 9003 that is logged in the Telemetry event log on the server and recall errors is based on Event ID 9006. To investigate files that are failing to tier or recall, see [How to troubleshoot files that fail to tier](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-cloud-tiering?toc=/azure/storage/file-sync/toc.json#how-to-troubleshoot-files-that-fail-to-tier) and [How to troubleshoot files that fail to be recalled](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-cloud-tiering?toc=/azure/storage/file-sync/toc.json#how-to-troubleshoot-files-that-fail-to-be-recalled). +- The server endpoint health and sync activity (Upload to cloud, Download to server) in the portal is based on the sync events that are logged in the Telemetry event log on the server (ID 9102 and 9302). If a sync session fails because of a transient error, such as error canceled, the server endpoint will still show as Healthy in the portal as long as the current sync session is making progress (files are applied). Event ID 9302 is the sync progress event and Event ID 9102 is logged once a sync session completes. For more information, see [sync health](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#broken-sync) and [sync progress](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#how-do-i-monitor-the-progress-of-a-current-sync-session). If the server endpoint health shows a status other than Healthy, see the [troubleshooting documentation](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#broken-sync) for guidance. +- The Persistent sync errors and Transient sync errors count in the portal is based on the Event ID 9121 that is logged in the Telemetry event log on the server. This event is logged for each per-item error once the sync session completes. To view the errors in the portal, go to the Server Endpoint Properties and navigate to the Errors + troubleshooting section. To resolve per-item errors, see [How do I see if there are specific files or folders that are not syncing?](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#how-do-i-see-if-there-are-specific-files-or-folders-that-are-not-syncing). +- The Cloud tiering space savings provides the amount of disk space saved by cloud tiering. The data provided for Cloud tiering space savings is based on Event ID 9071 that is logged in the Telemetry event log on the server. To view additional cloud tiering information and metrics, go to the Server Endpoint Properties and navigate to the Cloud tiering status section. To learn more, see [Monitor cloud tiering](file-sync-monitor-cloud-tiering.md). +- To view Tiering errors and Recall errors in the portal, go to the Server Endpoint Properties and navigate to the Errors + troubleshooting section. Tiering errors is based on Event ID 9003 that is logged in the Telemetry event log on the server and Recall errors is based on Event ID 9006. To investigate files that are failing to tier or recall, see [How to troubleshoot files that fail to tier](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-cloud-tiering?toc=/azure/storage/file-sync/toc.json#how-to-troubleshoot-files-that-fail-to-tier) and [How to troubleshoot files that fail to be recalled](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-cloud-tiering?toc=/azure/storage/file-sync/toc.json#how-to-troubleshoot-files-that-fail-to-be-recalled). ### Metric charts To view the health of a server endpoint in the portal, navigate to the Syn \| Metric name \| Description \| Blade name \| \|-\|-\|-\| - \| Bytes synced \| Size of data transferred (upload and download) \| Sync group, Server endpoint \| - \| Cloud tiering recall \| Size of data recalled \| Registered servers \| - \| Files not syncing \| Count of files that are failing to sync \| Server endpoint \| - \| Files synced \| Count of files transferred (upload and download) \| Sync group, Server endpoint \| - \| Server online status \| Count of heartbeats received from the server \| Registered servers \| + \| Bytes synced \| Size of data transferred (upload and download). \| Server endpoint - Sync status \| + \| Files not syncing \| Count of files that are failing to sync. \| Server endpoint - Sync status \| + \| Files synced \| Count of files transferred (upload and download). \| Server endpoint - Sync status \| + \| Cloud tiering cache hit rate \| Percentage of bytes, not whole files, that have been served from the cache vs. recalled from the cloud. \| Server endpoint - Cloud tiering status \| + \| Cache data size by last access time \| Size of data by last access time. \| Server endpoint - Cloud tiering status \| + \| Cloud tiering size of data tiered by last maintenance job \| Size of data tiered during last maintenance job. \| Server endpoint - Cloud tiering status \| + \| Cloud tiering recall size by application \| Size of data recalled by application. \| Server endpoint - Cloud tiering status \| + \| Cloud tiering recall \| Size of data recalled. \| Server endpoint - Cloud tiering status, Registered servers \| + \| Server online status \| Count of heartbeats received from the server. \| Registered servers \| - To learn more, see [Azure Monitor](#azure-monitor).
storage	Storage Files Quick Create Use Windows	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/files/storage-files-quick-create-use-windows.md	description: This tutorial covers how to create an SMB Azure file share using th Previously updated : 07/28/2022 Last updated : 10/09/2023 #Customer intent: As an IT admin new to Azure Files, I want to try out Azure file shares so I can determine whether I want to subscribe to the service. Next, create an SMB Azure file share. 1. When the Azure storage account deployment is complete, select Go to resource. 1. Select File shares from the storage account pane. - ![Screenshot, File shares selected.](./media/storage-files-quick-create-use-windows/click-files.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/click-files.png" alt-text="Screenshot showing how to select file shares from the storage account pane."::: 1. Select + File Share. - ![Screenshot, + file share selected to create a new file share.](./media/storage-files-quick-create-use-windows/create-file-share.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/create-file-share.png" alt-text="Screenshot showing how to create a new file share."::: 1. Name the new file share qsfileshare and leave Transaction optimized selected for Tier. +1. Select the Backup tab. By default, backup is enabled when you create an Azure file share using the Azure portal. If you want to disable backup for the file share, uncheck the Enable backup checkbox. If you want backup enabled, you can either leave the defaults or create a new Recovery Services Vault. To create a new backup policy, select Create a new policy. + + :::image type="content" source="media/storage-files-quick-create-use-windows/create-file-share-backup.png" alt-text="Screenshot showing how to enable or disable file share backup." border="true"::: + 1. Select Review + create and then Create to create the file share. 1. Create a new txt file called qsTestFile on your local machine. 1. Select the new file share, then on the file share location, select Upload. - ![Screenshot of file upload.](./media/storage-files-quick-create-use-windows/create-file-share-portal5.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/create-file-share-portal5.png" alt-text="Screenshot showing how to upload a file to the new file share."::: 1. Browse to the location where you created your .txt file > select qsTestFile.txt > select Upload. So far, you've created an Azure storage account and a file share with one file i 1. Under Popular services select Virtual machine. 1. In the Basics tab, under Project details, select the resource group you created earlier. - ![Screenshot of Basic tab, basic VM information filled out.](./media/storage-files-quick-create-use-windows/vm-resource-group-and-subscription.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/vm-resource-group-and-subscription.png" alt-text="Screenshot of the Basic tab with VM information filled out."::: 1. Under Instance details, name the VM qsVM. 1. For Security type, select Standard. So far, you've created an Azure storage account and a file share with one file i 1. Under Inbound port rules, choose Allow selected ports and then select RDP (3389) and HTTP from the drop-down. 1. Select Review + create. 1. Select Create. Creating a new VM will take a few minutes to complete.- 1. Once your VM deployment is complete, select Go to resource. ### Connect to your VM Now that you've created the VM, connect to it so you can mount your file share. 1. Select Connect on the virtual machine properties page. - ![Screenshot of VM tab, +Connect highlighted.](./media/storage-files-quick-create-use-windows/connect-vm.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/connect-vm.png" alt-text="Screenshot of the VM tab, +Connect is highlighted."::: 1. In the Connect to virtual machine page, keep the default options to connect by IP address over port number 3389 and select Download RDP file. 1. Open the downloaded RDP file and select Connect when prompted. 1. In the Windows Security window, select More choices and then Use a different account. Type the username as localhost\username, where <username> is the VM admin username you created for the virtual machine. Enter the password you created for the virtual machine, and then select OK. - ![Screenshot of VM login prompt, More choices highlighted.](./media/storage-files-quick-create-use-windows/local-host2.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/local-host2.png" alt-text="Screenshot of the VM log in prompt, more choices is highlighted."::: + 1. You may receive a certificate warning during the sign-in process. Select Yes or Continue to create the connection. Now that you've mapped the drive, create a snapshot. 1. In the portal, navigate to your file share, select Snapshots, then select + Add snapshot and then OK. - ![Screenshot of storage account snapshots tab.](./media/storage-files-quick-create-use-windows/create-snapshot.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/create-snapshot.png" alt-text="Screenshot of the storage account snapshots tab."::: + 1. In the VM, open the qstestfile.txt and type "this file has been modified". Save and close the file. 1. Create another snapshot. Now that you've mapped the drive, create a snapshot. 1. On your file share, select Snapshots. 1. On the Snapshots tab, select the first snapshot in the list. - ![Snapshots tab, first snapshot highlighted.](./media/storage-files-quick-create-use-windows/snapshot-list.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/snapshot-list.png" alt-text="Screenshot of the Snapshots tab, the first snapshot is highlighted."::: 1. Open that snapshot, and select qsTestFile.txt. Now that you've mapped the drive, create a snapshot. :::image type="content" source="media/storage-files-quick-create-use-windows/restore-share-snapshot.png" alt-text="Screenshot of the snapshot tab, qstestfile is selected, restore is highlighted."::: -1. Select Overwrite original file and then OK. +1. Select Overwrite original file and then select OK. - ![Screenshot of restore pop up, overwrite original file is selected.](./media/storage-files-quick-create-use-windows/snapshot-download-restore-portal.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/snapshot-download-restore-portal.png" alt-text="Screenshot of the Restore pop up, overwrite original file is selected."::: 1. In the VM, open the file. The unmodified version has been restored. Now that you've mapped the drive, create a snapshot. 1. On your file share, select Snapshots. 1. On the Snapshots tab, select the last snapshot in the list and select Delete. - ![Screenshot of the snapshots tab, last snapshot selected, delete button highlighted.](./media/storage-files-quick-create-use-windows/portal-snapshots-delete.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/portal-snapshots-delete.png" alt-text="Screenshot of the Snapshots tab, the last snapshot is selected and the delete button is highlighted."::: ## Use a share snapshot in Windows -Just like with on-premises VSS snapshots, you can view the snapshots from your mounted Azure file share by using the Previous Versions tab. +Just like with on-premises VSS snapshots, you can view the snapshots from your mounted Azure file share by using the Previous versions tab. 1. In File Explorer, locate the mounted share. - ![Screenshot of a mounted share in File Explorer.](./media/storage-files-quick-create-use-windows/snapshot-windows-mount.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/snapshot-windows-mount.png" alt-text="Screenshot of a mounted share in File Explorer."::: 1. Select qsTestFile.txt and > right-click and select Properties from the menu. - ![Screenshot of the right-click menu for a selected directory.](./media/storage-files-quick-create-use-windows/snapshot-windows-previous-versions.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/snapshot-windows-previous-versions.png" alt-text="Screenshot of the right click menu for a selected directory."::: 1. Select Previous Versions to see the list of share snapshots for this directory. 1. Select Open to open the snapshot. - ![Screenshot of previous Versions tab.](./media/storage-files-quick-create-use-windows/snapshot-windows-list.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/snapshot-windows-list.png" alt-text="Screenshot of the Previous versions tab."::: ## Restore from a previous version 1. Select Restore. This copies the contents of the entire directory recursively to the original location at the time the share snapshot was created. - ![Screenshot of previous versions, restore button in warning message is highlighted.](./media/storage-files-quick-create-use-windows/snapshot-windows-restore.png) + :::image type="content" source="media/storage-files-quick-create-use-windows/snapshot-windows-restore.png" alt-text="Screenshot of the Previous versions tab, the restore button in warning message is highlighted."::: > [!NOTE] - > If your file has not changed, you will not see a previous version for that file because that file is the same version as the snapshot. This is consistent with how this works on a Windows file server. + > If your file hasn't changed, you won't see a previous version for that file because that file is the same version as the snapshot. This is consistent with how this works on a Windows file server. ## Clean up resources
storage	Storage How To Use Files Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/files/storage-how-to-use-files-portal.md	description: Learn how to create and use Azure file shares with the Azure portal Previously updated : 01/03/2023 Last updated : 10/09/2023 ms.devlang: azurecli To create an Azure file share: 1. On the menu at the top of the File shares page, select + File share. The New file share page drops down. 1. In Name, type myshare. Leave Transaction optimized selected for Tier. +1. Select the Backup tab. If you want to enable backup for this file share, leave the defaults selected. If you don't want to enable backup, uncheck the Enable backup checkbox. 1. Select Review + create and then Create to create the Azure file share. File share names must be all lower-case letters, numbers, and single hyphens, and must begin and end with a lower-case letter or number. The name can't contain two consecutive hyphens. For details about naming file shares and files, see [Naming and Referencing Shares, Directories, Files, and Metadata](/rest/api/storageservices/Naming-and-Referencing-Shares--Directories--Files--and-Metadata).
synapse-analytics	Gateway Ip Addresses	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/security/gateway-ip-addresses.md	The table below lists the individual Gateway IP addresses and also Gateway IP address ranges per region. -Periodically, we will retire Gateways using old hardware and migrate the traffic to new Gateways as per the process outlined at [Azure SQL Database traffic migration to newer Gateways](/azure/azure-sql/database/gateway-migration?view=azuresql&tabs=in-progress-ip&preserve-view=true). We strongly encourage customers to use the Gateway IP address subnets in order to not be impacted by this activity in a region. +Periodically, we will retire Gateways using old hardware and migrate the traffic to new Gateways as per the process outlined at [Azure SQL Database traffic migration to newer Gateways](/azure/azure-sql/database/gateway-migration?view=azuresql&tabs=in-progress-ip). We strongly encourage customers to use the Gateway IP address subnets in order to not be impacted by this activity in a region. > [!IMPORTANT] > - Logins for SQL Database or dedicated SQL pools (formerly SQL DW) in Azure Synapse can land on any of the Gateways in a region. For consistent connectivity to SQL Database or dedicated SQL pools (formerly SQL DW) in Azure Synapse, allow network traffic to and from ALL Gateway IP addresses and Gateway IP address subnets for the region.
synapse-analytics	Troubleshoot Synapse Studio	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/troubleshoot/troubleshoot-synapse-studio.md	Title: Troubleshoot Synapse Studio description: Troubleshoot Synapse Studio-+ Previously updated : 04/15/2020 -- Last updated : 10/01/2023++ # Synapse Studio troubleshooting wss://{workspace}.dev.azuresynapse.net/{path} This URL pattern is looser than the one shown in ΓÇ£Root CauseΓÇ¥ section because it allows for us adding new WebSocket-dependent features to Synapse without any potential connectivity issue in the future. +## Message queue full or is completed and cannot accept more items + +### Symptom + +If you add a notebook which contains more than 128 code cells to a pipeline, pipeline runs fail with the error code 6002 and error message: "MessageQueueFullException: The message queue is full or is completed and cannot accept more items." ++ +### Root cause: + +There is a limitation of 128 of cells when executing a Synapse notebook activity from a pipeline. + +### Action: + +You could merge cells in order to reduce the number of cells below 128. + ## Next steps If the previous steps don't help to resolve your issue [Create a support ticket](../sql-data-warehouse/sql-data-warehouse-get-started-create-support-ticket.md?bc=%2fazure%2fsynapse-analytics%2fbreadcrumb%2ftoc.json&toc=%2fazure%2fsynapse-analytics%2ftoc.json)
virtual-desktop	Whats New	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-desktop/whats-new.md	Title: What's new in Azure Virtual Desktop? - Azure description: New features and product updates for Azure Virtual Desktop. Previously updated : 08/22/2023 Last updated : 10/09/2023 # What's new in Azure Virtual Desktop? -Azure Virtual Desktop updates regularly. This article is where you'll find out about: +Azure Virtual Desktop updates regularly. This article is where you find out about: - The latest updates - New features For more information about this announcement, see [Announcing General Availabili Private Link with Azure Virtual Desktop allows users to establish secure connections to remote resources using private endpoints. With Private Link, traffic between your virtual network and the Azure Virtual Desktop service is routed through the Microsoft backbone network. This routing eliminates the need to expose your service to the public internet, enhancing the overall security of your infrastructure. By keeping traffic within this protected network, Private Link adds an extra layer of security for your Azure Virtual Desktop environment. For more information about Private Link, see [Azure Private Link with Azure Virtual Desktop](private-link-overview.md) or read [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop-blog/announcing-the-general-availability-of-private-link-for-azure/ba-p/3874429). +### Tamper protection support for Azure Virtual Desktop + +Microsoft Intune now supports the use of endpoint security [antivirus policy](/mem/intune/protect/endpoint-security-antivirus-policy#prerequisites-for-tamper-protection) to manage [tamper protection](/microsoft-365/security/defender-endpoint/prevent-changes-to-security-settings-with-tamper-protection) for Azure Virtual Desktop session hosts running Windows 11 Enterprise or Windows 11 Enterprise multi-session. Support for tamper protection requires you to onboard session hosts to Microsoft Defender for Endpoint before you apply the policy that enables tamper protection. + ## June 2023 Here's what changed in June 2023: ### Azure Virtual Desktop Insights support for the Azure Monitor Agent now in preview -Azure Virtual Desktop Insights is a dashboard built on Azure Monitor workbooks that helps IT professionals understand their Azure Virtual Desktop environments. Azure Virtual Desktops Insights support for the Azure Monitor agent is now in preview. For more information see [Use Azure Virtual Desktop Insights to monitor your deployment](insights.md?tabs=monitor). +Azure Virtual Desktop Insights is a dashboard built on Azure Monitor workbooks that helps IT professionals understand their Azure Virtual Desktop environments. Azure Virtual Desktops Insights support for the Azure Monitor agent is now in preview. For more information, see [Use Azure Virtual Desktop Insights to monitor your deployment](insights.md?tabs=monitor). ### Administrative template for Azure Virtual Desktop now available in Intune -We've created an administrative template for Azure Virtual Desktop to help you configure certain features in Azure Virtual Desktop. This administrative template is now available in Intune, which enables you to centrally configure session hosts that are enrolled in Intune and joined to Azure Active Directory (Azure AD) or hybrid Azure AD joined. +We've created an administrative template for Azure Virtual Desktop to help you configure certain features in Azure Virtual Desktop. This administrative template is now available in Intune, which enables you to centrally configure session hosts that are enrolled in Intune and Azure Active Directory (Azure AD) joined or hybrid Azure AD joined. -For more information see [Administrative template for Azure Virtual Desktop](administrative-template.md?tabs=intune). +For more information, see [Administrative template for Azure Virtual Desktop](administrative-template.md?tabs=intune). ## May 2023 Here's what changed in May 2023: ### Custom image templates is now in preview -Custom image templates is now in preview. Custom image templates help you easily create a custom image that you can use when deploying session host VMs. With custom images, you can standardize the configuration of your session host VMs for your organization. Custom image templates is built on [Azure Image Builder](../virtual-machines/image-builder-overview.md) and tailored for Azure Virtual Desktop. For more information about the preview, check out [Custom image templates](custom-image-templates.md) or read [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop-blog/announcing-the-public-preview-of-azure-virtual-desktop-custom/ba-p/3784361). +Custom image templates is now in preview. Custom image templates help you easily create a custom image that you can use when deploying session host VMs. With custom images, you can standardize the configuration of your session host VMs for your organization. Custom image templates is built on [Azure Image Builder](../virtual-machines/image-builder-overview.md) and tailored for Azure Virtual Desktop. For more information about the preview, check out [Custom image templates](custom-image-templates.md) or read [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop-blog/announcing-the-public-preview-of-azure-virtual-desktop-custom/ba-p/3784361). ## April 2023 For more information about the public preview release version, check out [Use fe ### Intune user-scope configuration for Windows 10 Enterprise multi-session VMs now generally available -Microsoft Intune user-scope configuration for Azure Virtual Desktop multi-session Virtual Machines (VMs) on Windows 10 and 11 are now generally available. With this feature, you are able to: +Microsoft Intune user-scope configuration for Azure Virtual Desktop multi-session Virtual Machines (VMs) on Windows 10 and 11 are now generally available. With this feature, you're able to: - Configure user-scope policies using the Settings catalog and assign those policies to groups of users. - Configure user certificates and assign them to users. The ability to review performance and diagnostic information across multiple hos ### Intune user configuration for Windows 11 Enterprise multi-session VMs now generally available -Microsoft Intune user scope configuration for Azure Virtual Desktop multi-session VMs on Windows 11 is now generally available. With this feature, youΓÇÖll be able to: +Microsoft Intune user scope configuration for Azure Virtual Desktop multi-session VMs on Windows 11 is now generally available. With this feature, you're able to: - Configure user scope policies using the Settings catalog and assign them to groups of users. - Configure user certificates and assign them to users. We've improved the host pool deployment process. You can now deploy host pools i ### FSLogix version 2210 now in public preview -FSLogix version 2210 is now public preview. This new version includes new features, bug fixes, and other improvements. One of the new features is Disk Compaction, which lets you remove white space in a disk to shrink the disk size. Disk Compaction will save you significant amounts of storage capacity in the storage spaces where you keep your FSLogix disks. For more information, see [WhatΓÇÖs new in FSLogix](/fslogix/whats-new#fslogix-2210-29830844092public-preview) or [the FSLogix Disk Compaction blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop-blog/announcing-public-preview-fslogix-disk-compaction/ba-p/3644807). +FSLogix version 2210 is now public preview. This new version includes new features, bug fixes, and other improvements. One of the new features is Disk Compaction, which lets you remove white space in a disk to shrink the disk size. Disk Compaction saves you significant amounts of storage capacity in the storage spaces where you keep your FSLogix disks. For more information, see [WhatΓÇÖs new in FSLogix](/fslogix/whats-new#fslogix-2210-29830844092public-preview) or [the FSLogix Disk Compaction blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop-blog/announcing-public-preview-fslogix-disk-compaction/ba-p/3644807). ### Universal Print for Azure Virtual Desktop now generally available In the Windows Insider build of Windows 11 22H2, you can now enable a preview ve ### Universal Print for Azure Virtual Desktop now in Windows Insider preview -The Windows Insider build of Windows 11 22H2 also includes a preview version of the Universal Print for Azure Virtual Desktop feature. We hope this feature will provide an improved printing experience that combines the benefits of Azure Virtual Desktop and Universal Print for Windows 11 multi-session users. Learn more at [Printing on Azure Virtual Desktop using Universal Print](/universal-print/fundamentals/universal-print-avd) and [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop/a-better-printing-experience-for-azure-virtual-desktop-with/m-p/3598592). +The latest Windows Insider build of Windows 11 22H2 also includes a preview version of the Universal Print for Azure Virtual Desktop feature. This feature provides an improved printing experience that combines the benefits of Azure Virtual Desktop and Universal Print for Windows 11 multi-session users. Learn more at [Printing on Azure Virtual Desktop using Universal Print](/universal-print/fundamentals/universal-print-avd) and [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop/a-better-printing-experience-for-azure-virtual-desktop-with/m-p/3598592). ### Autoscale for pooled host pools now generally available For more information, see [our blog post](https://techcommunity.microsoft.com/t5 ### Teams media optimizations for macOS now generally available -Teams media optimizations for redirecting audio and video during calls and meetings to a local macOS machine is now generally available. To use this feature, you'll need to update or install, at a minimum, version 10.7.7 of the Azure Virtual Desktop macOS client. Learn more at [Use Microsoft Teams on Azure Virtual Desktop](teams-on-avd.md) and [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop-blog/microsoft-teams-media-optimizations-is-now-generally-available/ba-p/3563125). +Teams media optimizations for redirecting audio and video during calls and meetings to a local macOS machine is now generally available. To use this feature, you need to update or install, at a minimum, version 10.7.7 of the Azure Virtual Desktop macOS client. Learn more at [Use Microsoft Teams on Azure Virtual Desktop](teams-on-avd.md) and [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop-blog/microsoft-teams-media-optimizations-is-now-generally-available/ba-p/3563125). ## May 2022 Azure Active Directory domain join for Azure Virtual Desktop VMs is now availabl ### Breaking change in Azure Virtual Desktop Azure Resource Manager template -A breaking change has been introduced into the Azure Resource Manager template for Azure Virtual Desktop. If you're using any code that depends on the change, then you'll need to follow the directions in [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop/azure-virtual-desktop-arm-template-change-removal-of-script/m-p/2851538#M7971) to address the issue. +A breaking change has been introduced into the Azure Resource Manager template for Azure Virtual Desktop. If you're using any code that depends on the change, then you need to follow the directions in [our blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop/azure-virtual-desktop-arm-template-change-removal-of-script/m-p/2851538#M7971) to address the issue. ### Autoscale (preview) public preview All available images in the Azure Virtual Desktop image gallery that include Mic ### Azure Active Directory Domain Join for Session hosts is in public preview -You can now join your Azure Virtual Desktop VMs directly to Azure Active Directory (Azure AD). This feature lets you connect to your VMs from any device with basic credentials. You can also automatically enroll your VMs with Microsoft Intune. For certain scenarios, this will help eliminate the need for a domain controller, reduce costs, and streamline your deployment. Learn more at [Deploy Azure AD joined virtual machines in Azure Virtual Desktop](deploy-azure-ad-joined-vm.md). +You can now join your Azure Virtual Desktop VMs directly to Azure Active Directory (Azure AD). This feature lets you connect to your VMs from any device with basic credentials. You can also automatically enroll your VMs with Microsoft Intune. For certain scenarios, this helps eliminate the need for a domain controller, reduce costs, and streamline your deployment. Learn more at [Deploy Azure AD joined virtual machines in Azure Virtual Desktop](deploy-azure-ad-joined-vm.md). ### FSLogix version 2105 is now available We recently announced a new pricing option for RemoteApp streaming for using Azu We recently released four new handbooks to help you design and deploy Azure Virtual Desktop in different scenarios: -- [Application Management](https://azure.microsoft.com/resources/azure-virtual-desktop-handbook-application-management/) will show you how to modernize application delivery and simplify IT management. +- [Application Management](https://azure.microsoft.com/resources/azure-virtual-desktop-handbook-application-management/) shows you how to modernize application delivery and simplify IT management. - In [Disaster Recovery](https://azure.microsoft.com/resources/azure-virtual-desktop-handbook-disaster-recovery/), learn how to strengthen business resilience by developing a disaster recovery strategy. - Get more value from Citrix investments with the [Citrix Cloud with Azure Virtual Desktop](https://azure.microsoft.com/resources/migration-guide-citrix-cloud-with-azure-virtual-desktop/) migration guide. - Get more value from existing VMware investments with the [VMware Horizon with Azure Virtual Desktop](https://azure.microsoft.com/resources/migration-guide-vmware-horizon-cloud-and-azure-virtual-desktop/) migration guide. Metadata service for the European Union, UK, and Canada is now in general availa ### The Getting Started tool is now in public preview -We created the Azure Virtual Desktop Getting Started tool to make the deployment process easier for first-time users. By simplifying and automating the deployment process, we hope this tool will help make adopting Azure Virtual Desktop faster and more accessible to a wider variety of users. Learn more at our [blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop/getting-started-wizard-in-azure-virtual-desktop/m-p/2451385). +We created the Azure Virtual Desktop Getting Started tool to make the deployment process easier for first-time users. By simplifying and automating the deployment process, we hope this tool helps make adopting Azure Virtual Desktop faster and more accessible to a wider variety of users. Learn more at our [blog post](https://techcommunity.microsoft.com/t5/azure-virtual-desktop/getting-started-wizard-in-azure-virtual-desktop/m-p/2451385). ### Azure Virtual Desktop pricing calculator updates On September 30, 2021, the Azure Virtual Desktop web client will no longer suppo ### Microsoft Intune public preview -We've started the public preview for Microsoft Intune support in Windows 10 Enterprise multi-session. This new feature will let you manage your Windows 10 VMs with the same tools as your local devices. Learn more at our [Microsoft Endpoint Manger documentation](/mem/intune/fundamentals/windows-virtual-desktop-multi-session). +We've started the public preview for Microsoft Intune support in Windows 10 Enterprise multi-session. Intune support lets you manage your Windows 10 VMs with the same tools as your local devices. Learn more at our [Microsoft Endpoint Manger documentation](/mem/intune/fundamentals/windows-virtual-desktop-multi-session). ### FSLogix version 2105 public preview -We have released a public preview of the latest version of the FSLogix agent. Check out our [blog post](https://techcommunity.microsoft.com/t5/windows-virtual-desktop/public-preview-fslogix-release-2105-is-now-available-in-public/m-p/2380996/thread-id/7105) for more information and to submit the form you'll need to access the preview. +We have released a public preview of the latest version of the FSLogix agent. Check out our [blog post](https://techcommunity.microsoft.com/t5/windows-virtual-desktop/public-preview-fslogix-release-2105-is-now-available-in-public/m-p/2380996/thread-id/7105) for more information and to submit the form you need to access the preview. ### May 2021 updates for Teams for Azure Virtual Desktop You can now configure Start VM on Connect (preview) in the Azure portal. With th ### Required URL Check tool -The Azure Virtual Desktop agent, version 1.0.2944.400 includes a tool that validates URLs and displays whether the virtual machine can access the URLs it needs to function. If any required URLs are accessible, the tool will list them so you can unblock them, if needed. Learn more at [Required URL Check tool](required-url-check-tool.md). +The Azure Virtual Desktop agent, version 1.0.2944.400 includes a tool that validates URLs and displays whether the virtual machine can access the URLs it needs to function. If any required URLs are accessible, the tool lists them so you can unblock them, if needed. Learn more at [Required URL Check tool](required-url-check-tool.md). ### Updates to the Azure portal UI for Azure Virtual Desktop Here's what changed in the latest update of the Azure portal UI for Azure Virtua Here's what's new for Teams on Azure Virtual Desktop: - Added hardware acceleration for video processing of outgoing video streams for Windows 10-based clients.-- When joining a meeting with both a front facing camera and a rear facing or external camera, the front facing camera will be selected by default. +- When joining a meeting with both a front facing camera and a rear facing or external camera, the front facing camera is selected by default. - Resolved an issue that made Teams crash on x86-based machines. - Resolved an issue that caused striations during screen sharing. - Resolved an issue that prevented meeting members from seeing incoming video or screen sharing. Here's what changed in March 2021. We've made the following updates to Azure Virtual Desktop for the Azure portal: - We've enabled new availability options (availability set and zones) for the workflows to create host pools and add VMs.-- We've fixed an issue where a host with the "Needs assistance" status appeared as unavailable. Now the host will have a warning icon next to it. +- We've fixed an issue where a host with the "Needs assistance" status appeared as unavailable. Now the host has a warning icon next to it. - We've enabled sorting for active sessions. - You can now send messages to or sign out specific users on the host details tab. - We've changed the maximum session limit field. We've made the following updates to Azure Virtual Desktop for the Azure portal: ### Generation 2 images and Trusted Launch -The Azure Marketplace now has Generation 2 images for Windows 10 Enterprise and Windows 10 Enterprise multi-session. These images will let you use Trusted Launch VMs. Learn more about Generation 2 VMs at [Should I create a generation 1 or 2 virtual machine](../virtual-machines/generation-2.md). To learn how to provision Azure Virtual Desktop Trusted Launch VMs, see [our TechCommunity post](https://techcommunity.microsoft.com/t5/windows-virtual-desktop/windows-virtual-desktop-support-for-trusted-launch/m-p/2206170). +The Azure Marketplace now has Generation 2 images for Windows 10 Enterprise and Windows 10 Enterprise multi-session. These images enable you to use Trusted Launch VMs. Learn more about Generation 2 VMs at [Should I create a generation 1 or 2 virtual machine](../virtual-machines/generation-2.md). To learn how to provision Azure Virtual Desktop Trusted Launch VMs, see [our TechCommunity post](https://techcommunity.microsoft.com/t5/windows-virtual-desktop/windows-virtual-desktop-support-for-trusted-launch/m-p/2206170). ### FSLogix is now preinstalled on Windows 10 Enterprise multi-session images We've made the following updates for Teams on Azure Virtual Desktop: ### Start VM on Connect public preview -The new host pool setting, Start VM on Connect, is now available in public preview. This setting lets you turn on your VMs whenever you need them. If you want to save costs, you'll need to deallocate your VMs by configuring your Azure Compute settings. For more information, check out [our blog post](https://aka.ms/wvdstartvmonconnect) and [our documentation](start-virtual-machine-connect.md). +The new host pool setting, Start VM on Connect, is now available in public preview. This setting lets you turn on your VMs whenever you need them. If you want to save costs, you need to deallocate your VMs by configuring your Azure Compute settings. For more information, check out [our blog post](https://aka.ms/wvdstartvmonconnect) and [our documentation](start-virtual-machine-connect.md). ### Azure Virtual Desktop Specialty certification New customers save 30 percent on Azure Virtual Desktop computing costs for D-ser ### networkSecurityGroupRules value change -In the Azure Resource Manager nested template, we changed the default value for networkSecurityGroupRules from an object to an array. This will prevent any errors if you use managedDisks-customimagevm.json without specifying a value for networkSecurityGroupRules. This wasn't a breaking change and is backward compatible. +In the Azure Resource Manager nested template, we changed the default value for `networkSecurityGroupRules` from an object to an array. This prevents errors if you use `managedDisks-customimagevm.json` without specifying a value for `networkSecurityGroupRules`. This wasn't a breaking change and is backward compatible. ### FSLogix hotfix update We recently set up the [Azure Virtual Desktop Agent troubleshooting guide](troub ### Microsoft Defender for Endpoint integration -Microsoft Defender for Endpoint integration is now generally available. This feature gives your Azure Virtual Desktop VMs the same investigation experience as a local Windows 10 machine. If you're using Windows 10 Enterprise multi-session, Microsoft Defender for Endpoint will support up to 50 concurrent user connections, giving you the cost savings of Windows 10 Enterprise multi-session and the confidence of Microsoft Defender for Endpoint. For more information, check out our [blog post](https://techcommunity.microsoft.com/t5/microsoft-defender-for-endpoint/windows-virtual-desktop-support-is-now-generally-available/ba-p/2103712). +Microsoft Defender for Endpoint integration is now generally available. This feature gives your Azure Virtual Desktop VMs the same investigation experience as a local Windows 10 machine. If you're using Windows 10 Enterprise multi-session, Microsoft Defender for Endpoint supports up to 50 concurrent user connections, giving you the cost savings of Windows 10 Enterprise multi-session and the confidence of Microsoft Defender for Endpoint. For more information, check out our [blog post](https://techcommunity.microsoft.com/t5/microsoft-defender-for-endpoint/windows-virtual-desktop-support-is-now-generally-available/ba-p/2103712). ### Azure Security baseline for Azure Virtual Desktop The public preview for Azure Virtual Desktop Insights is now available. This new ### Azure Resource Manager template change -In the latest update, we've removed all public IP address parameter from the Azure Resource Manager template for creating and provisioning host pools. We highly recommend you avoid using public IPs for Azure Virtual Desktop to keep your deployment secure. If your deployment relied on public IPs, you'll need to reconfigure it to use private IPs instead, otherwise your deployment won't work properly. +In the latest update, we've removed all public IP address parameter from the Azure Resource Manager template for creating and provisioning host pools. We highly recommend you avoid using public IPs for Azure Virtual Desktop to keep your deployment secure. If your deployment relied on public IPs, you need to reconfigure it to use private IPs instead, otherwise your deployment won't work properly. ### MSIX app attach public preview RDP Shortpath introduces direct connectivity to your Azure Virtual Desktop sessi ### Az.DesktopVirtualization, version 2.0.1 -We've released version 2.0.1 of the Azure Virtual Desktop cmdlets. This update includes cmdlets that will let you manage MSIX App Attach. You can download the new version at [the PowerShell gallery](https://www.powershellgallery.com/packages/Az.DesktopVirtualization/2.0.1). +We've released version 2.0.1 of the Azure Virtual Desktop cmdlets. This update includes cmdlets that let you manage MSIX App Attach. You can download the new version at [the PowerShell gallery](https://www.powershellgallery.com/packages/Az.DesktopVirtualization/2.0.1). ### Azure Advisor updates You can now use the [Experience Estimator](https://azure.microsoft.com/services/ - Added a "Need help with settings?" link to the desktop settings page. - Fixed an issue with the "Subscribe" button that happened when using high-contrast dark themes. -- Thanks to the tremendous help from our users, we've fixed two critical issues for the Microsoft Store Remote Desktop client. We'll continue to review feedback and fix issues as we broaden our phased release of the client to more users worldwide. +- Thanks to the tremendous help from our users, we've fixed two critical issues for the Microsoft Store Remote Desktop client. We continue to review feedback and fix issues as we broaden our phased release of the client to more users worldwide. - We've added a new feature that lets you change VM location, image, resource group, prefix name, network config as part of the workflow for adding a VM to your deployment in the Azure portal. Here's what changed in August 2020: - Norway - South Korea - You can use the [Experience Estimator](https://azure.microsoft.com/services/virtual-desktop/assessment/) to get a general idea of how these changes will affect your users. + You can use the [Experience Estimator](https://azure.microsoft.com/services/virtual-desktop/assessment/) to get a general idea of how these changes affect your users. - The Microsoft Store Remote Desktop Client (v10.2.1522+) is now generally available! This version of the Microsoft Store Remote Desktop Client is compatible with Azure Virtual Desktop. We've also introduced refreshed UI flows for improved user experiences. This update includes fluent design, light and dark modes, and many other exciting changes. We've also rewritten the client to use the same underlying remote desktop protocol (RDP) engine as the iOS, macOS, and Android clients. This lets us deliver new features at a faster rate across all platforms. [Download the client](https://www.microsoft.com/p/microsoft-remote-desktop/9wzdncrfj3ps?rtc=1&activetab=pivot:overviewtab) and give it a try!
virtual-machines	Edv4 Edsv4 Series	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/edv4-edsv4-series.md	Edsv4-series sizes run on the 3rd Generation Intel┬« Xeon┬« Platinum 8370C (Ice \| Standard_E32ds_v4 \| 32 \| 256 \| 1200 \| 32 \| 150000/2000 \| 51200/768 \| 64000/1600 \| 8 \| 16000 \| \| Standard_E48ds_v4 \| 48 \| 384 \| 1800 \| 32 \| 225000/3000 \| 76800/1152 \| 80000/2000 \| 8 \| 24000 \| \| Standard_E64ds_v4 <sup>2</sup> \| 64 \| 504 \| 2400 \| 32 \| 300000/4000 \| 80000/1200 \| 80000/2000 \| 8 \| 30000 \| -\| Standard_E80ids_v4 <sup>3,5</sup> \| 80 \| 504 \| 2400 \| 64 \| 375000/4000 \| 80000/1200 \| 80000/2000 \| 8 \| 30000 \| +\| Standard_E80ids_v4 <sup>3,5</sup> \| 80 \| 504 \| 2400 \| 64 \| 375000/4000 \| 80000/1500 \| 80000/2000 \| 8 \| 30000 \| <sup>*</sup> These IOPs values can be guaranteed by using [Gen2 VMs](generation-2.md)
virtual-machines	Ev4 Esv4 Series	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/ev4-esv4-series.md	Esv4-series sizes run on the 3rd Generation Intel┬« Xeon┬« Platinum 8370C (Ice L \| Standard_E32s_v4 \| 32 \| 256 \| Remote Storage Only \| 32 \| 51200/768 \| 64000/1600 \| 8\|16000 \| \| Standard_E48s_v4 \| 48 \| 384 \| Remote Storage Only \| 32 \| 76800/1152 \| 80000/2000 \| 8\|24000 \| \| Standard_E64s_v4 <sup>2</sup> \| 64 \| 504\| Remote Storage Only \| 32 \| 80000/1200 \| 80000/2000 \| 8\|30000 \| -\| Standard_E80is_v4 <sup>3,5</sup> \| 80 \| 504 \| Remote Storage Only \| 64 \| 80000/1200 \| 80000/2000 \| 8\|30000 \| +\| Standard_E80is_v4 <sup>3,5</sup> \| 80 \| 504 \| Remote Storage Only \| 64 \| 80000/1500 \| 80000/2000 \| 8\|30000 \| <sup>1</sup> Esv4-series VMs can [burst](./disk-bursting.md) their disk performance and get up to their bursting max for up to 30 minutes at a time.
virtual-machines	Time Sync	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/time-sync.md	This should return `hyperv`, meaning the Azure host. In some Linux VMs you may see multiple PTP devices listed. One example is for Accelerated Networking the Mellanox mlx5 driver also creates a /dev/ptp device. Because the initialization order can be different each time Linux boots, the PTP device corresponding to the Azure host might be `/dev/ptp0` or it might be `/dev/ptp1`, which makes it difficult to configure `chronyd` with the correct clock source. To solve this problem, the most recent Linux images have a `udev` rule that creates the symlink `/dev/ptp_hyperv` to whichever `/dev/ptp` entry corresponds to the Azure host. Chrony should always be configured to use the `/dev/ptp_hyperv` symlink instead of `/dev/ptp0` or `/dev/ptp1`. +If you are having issues with the `/dev/ptp_hyperv` device not being created, you can use the `udev` rule and steps below to configure it: + +NOTE: Most Linux distributions should not need this udev rule as it has been implemented in newer versions of [systemd](https://github.com/systemd/systemd/commit/32e868f058da8b90add00b2958c516241c532b70) + +Create the `udev` rules file: +````bash +$ sudo cat > /etc/udev/rules.d/99-ptp_hyperv.rules << EOF +ACTION!="add", GOTO="ptp_hyperv" +SUBSYSTEM=="ptp", ATTR{clock_name}=="hyperv", SYMLINK += "ptp_hyperv" +LABEL="ptp_hyperv" +EOF +```` +Reboot the Virtual Machine OR reload the `udev` rules with: +````bash +$ sudo udevadm control --reload +$ sudo udevadm trigger --subsystem-match=ptp --action=add +```` + ### chrony On Ubuntu 19.10 and later versions, Red Hat Enterprise Linux, and CentOS 8.x, [chrony](https://chrony.tuxfamily.org/) is configured to use a PTP source clock. Instead of chrony, older Linux releases use the Network Time Protocol daemon (ntpd), which doesn't support PTP sources. To enable PTP in those releases, chrony must be manually installed and configured (in chrony.conf) by using the following statement:
virtual-network	Public Ip Basic Upgrade Guidance	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-network/ip-services/public-ip-basic-upgrade-guidance.md	We recommend the following approach to upgrade to Standard SKU public IP address \| Resource using Basic SKU public IP addresses \| Decision path \| \| \| \| \| Virtual Machine or Virtual Machine Scale Sets (flex model) \| Disassociate IP(s) and utilize the upgrade options detailed after the table. For virtual machines, you can use the [upgrade script](public-ip-upgrade-vm.md). \| - \| Load Balancer (Basic SKU) \| New LB SKU required. Use the upgrade scripts for [virtual machines](../../load-balancer/upgrade-basic-standard.md) or [Virtual Machine Scale Sets (without IPs per VM)](../../load-balancer/upgrade-basic-standard-virtual-machine-scale-sets.md) to upgrade to Standard Load Balancer \| + \| Load Balancer (Basic SKU) \| New LB SKU required. Use the upgrade script [Upgrade Basic Load Balancer to Standard SKU](../../load-balancer/upgrade-basic-standard-with-powershell.md) to upgrade to Standard Load Balancer \| \| VPN Gateway (Basic SKU or VpnGw1-5 SKU using Basic IPs) \| No action required for existing VPN gateways that use Basic SKU public IP addresses. For new VPN gateways, we recommend that you use Standard SKU public IP addresses.\| \| ExpressRoute Gateway (using Basic IPs) \| New ExpressRoute Gateway required. Create a [new ExpressRoute Gateway with a Standard SKU IP](../../expressroute/expressroute-howto-add-gateway-portal-resource-manager.md). \| \| Application Gateway (v1 SKU) \| New AppGW SKU required. Use this [migration script to migrate from v1 to v2](../../application-gateway/migrate-v1-v2.md). \|
virtual-network	Public Ip Upgrade Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-network/ip-services/public-ip-upgrade-vm.md	The module logs all upgrade activity to a file named `PublicIPUpgrade.log`, crea ## Constraints/ Unsupported Scenarios -* VMs with NICs associated to a Load Balancer: Because the Load Balancer and Public IP SKUs associated with a VM must match, it isn't possible to upgrade the instance-level Public IP addresses associated with a VM when the VM's NICs are also associated with a Load Balancer, either though Backend Pool or NAT Pool membership. Use the scripts for upgrading a basic load balancer used with [virtual machines](../../load-balancer/upgrade-basic-standard.md) or [virtual machine scale sets](../../load-balancer/upgrade-basic-standard-virtual-machine-scale-sets.md) to upgrade both the Load Balancer and Public IPs as the same time. +* VMs with NICs associated to a Load Balancer: Because the Load Balancer and Public IP SKUs associated with a VM must match, it isn't possible to upgrade the instance-level Public IP addresses associated with a VM when the VM's NICs are also associated with a Load Balancer, either through Backend Pool or NAT Pool membership. Use the scripts [Upgrade a Basic Load Balancer to Standard SKU](../../load-balancer/upgrade-basic-standard-with-powershell.md) to upgrade both the Load Balancer and Public IPs as the same time. * VMs without a Network Security Group: VMs with IPs to be upgraded must have a Network Security Group (NSG) associated with either the subnet of each IP configuration with a Public IP, or with the NIC directly. This is because Standard SKU Public IPs are "secure by default", meaning that any traffic to the Public IP must be explicitly allowed at an NSG to reach the VM. Basic SKU Public IPs allow any traffic by default. Upgrading Public IP SKUs without an NSG would result in inbound internet traffic to the Public IP previously allowed with the Basic SKU being blocked post-migration. See: [Public IP SKUs](public-ip-addresses.md#sku)
virtual-wan	Virtual Wan Expressroute About	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-wan/virtual-wan-expressroute-about.md	This article provides details on ExpressRoute connections in Azure Virtual WAN. A virtual hub can contain gateways for site-to-site, ExpressRoute, or point-to-site functionality. Users using private connectivity in Virtual WAN can connect their ExpressRoute circuits to an ExpressRoute gateway in a Virtual WAN hub. For a tutorial on connecting an ExpressRoute circuit to an Azure Virtual WAN hub, see [How to Connect an ExpressRoute Circuit to Virtual WAN](virtual-wan-expressroute-portal.md). ## ExpressRoute circuit SKUs supported in Virtual WAN -The following ExpressRoute circuit SKUs can be connected to the hub gateway: Local, Standard, and Premium. To learn more about different SKUs, visit [ExpressRoute Circuit SKUs](../expressroute/expressroute-faqs.md#what-is-the-connectivity-scope-for-different-expressroute-circuit-skus). ExpressRoute Local circuits can only be connected to ExpressRoute gateways in the same region, but they can still access resources in spoke virtual networks located in other regions. +The following ExpressRoute circuit SKUs can be connected to the hub gateway: Local, Standard, and Premium. ExpressRoute Direct circuits are also supported with Virtual WAN. To learn more about different SKUs, visit [ExpressRoute Circuit SKUs](../expressroute/expressroute-faqs.md#what-is-the-connectivity-scope-for-different-expressroute-circuit-skus). ExpressRoute Local circuits can only be connected to ExpressRoute gateways in the same region, but they can still access resources in spoke virtual networks located in other regions. ## ExpressRoute performance
vpn-gateway	Vpn Gateway About Vpn Gateway Settings	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/vpn-gateway/vpn-gateway-about-vpn-gateway-settings.md	The values in this article apply VPN gateways (virtual network gateways that use * For Virtual WAN, see [About Virtual WAN](../virtual-wan/virtual-wan-about.md). +## <a name="vpntype"></a>VPN types: Route-based and policy-based + +Currently, Azure supports two gateway VPN types: route-based VPN gateways and policy-based VPN gateways. They're built on different internal platforms, which result in different specifications. + +As of Oct 1, 2023, you can't create a policy-based VPN gateway. All new VPN gateways will automatically be created as route-based. If you already have a policy-based gateway, you don't need to upgrade your gateway to route-based. + +Previously, the older gateway SKUs didn't support IKEv1 for route-based gateways. Now, most of the current gateway SKUs support both IKEv1 and IKEv2. ++ ## <a name="gwtype"></a>Gateway types Each virtual network can only have one virtual network gateway of each type. When you're creating a virtual network gateway, you must make sure that the gateway type is correct for your configuration. New-AzVirtualNetworkGateway -Name vnetgw1 -ResourceGroupName testrg ` ## <a name="gwsku"></a>Gateway SKUs ### Configure a gateway SKU Before you create a VPN gateway, you must create a gateway subnet. The gateway s When you create the gateway subnet, you specify the number of IP addresses that the subnet contains. The IP addresses in the gateway subnet are allocated to the gateway VMs and gateway services. Some configurations require more IP addresses than others. -When you're planning your gateway subnet size, refer to the documentation for the configuration that you're planning to create. For example, the ExpressRoute/VPN Gateway coexist configuration requires a larger gateway subnet than most other configurations. While it's possible to create a gateway subnet as small as /29 (applicable to the Basic SKU only), all other SKUs require a gateway subnet of size /27 or larger (/27, /26, /25 etc.). You may want to create a gateway subnet larger than /27 so that the subnet has enough IP addresses to accommodate possible future configurations. +When you're planning your gateway subnet size, refer to the documentation for the configuration that you're planning to create. For example, the ExpressRoute/VPN Gateway coexist configuration requires a larger gateway subnet than most other configurations. While it's possible to create a gateway subnet as small as /29 (applicable to the Basic SKU only), all other SKUs require a gateway subnet of size /27 or larger (/27, /26, /25 etc.). You might want to create a gateway subnet larger than /27 so that the subnet has enough IP addresses to accommodate possible future configurations. The following Resource Manager PowerShell example shows a gateway subnet named GatewaySubnet. You can see the CIDR notation specifies a /27, which allows for enough IP addresses for most configurations that currently exist. Considerations: [!INCLUDE [vpn-gateway-gwudr-warning.md](../../includes/vpn-gateway-gwudr-warning.md)] -* When working with gateway subnets, avoid associating a network security group (NSG) to the gateway subnet. Associating a network security group to this subnet may cause your virtual network gateway (VPN and Express Route gateways) to stop functioning as expected. For more information about network security groups, see [What is a network security group?](../virtual-network/network-security-groups-overview.md). +* When working with gateway subnets, avoid associating a network security group (NSG) to the gateway subnet. Associating a network security group to this subnet might cause your virtual network gateway (VPN and Express Route gateways) to stop functioning as expected. For more information about network security groups, see [What is a network security group?](../virtual-network/network-security-groups-overview.md). ## <a name="lng"></a>Local network gateways
vpn-gateway	Vpn Gateway Connect Multiple Policybased Rm Ps	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/vpn-gateway/vpn-gateway-connect-multiple-policybased-rm-ps.md	Previously updated : 08/10/2022 Last updated : 10/09/2023 -# Connect Azure VPN gateways to multiple on-premises policy-based VPN devices using PowerShell +# Connect a VPN gateway to multiple on-premises policy-based VPN devices - PowerShell This article helps you configure an Azure route-based VPN gateway to connect to multiple on-premises policy-based VPN devices leveraging custom IPsec/IKE policies on S2S VPN connections. This article helps you configure an Azure route-based VPN gateway to connect to Policy-based vs. route-based VPN devices differ in how the IPsec traffic selectors are set on a connection: -* Policy-based VPN devices use the combinations of prefixes from both networks to define how traffic is encrypted/decrypted through IPsec tunnels. It is typically built on firewall devices that perform packet filtering. IPsec tunnel encryption and decryption are added to the packet filtering and processing engine. -* Route-based VPN devices use any-to-any (wildcard) traffic selectors, and let routing/forwarding tables direct traffic to different IPsec tunnels. It is typically built on router platforms where each IPsec tunnel is modeled as a network interface or VTI (virtual tunnel interface). +* Policy-based VPN devices use the combinations of prefixes from both networks to define how traffic is encrypted/decrypted through IPsec tunnels. It's typically built on firewall devices that perform packet filtering. IPsec tunnel encryption and decryption are added to the packet filtering and processing engine. +* Route-based VPN devices use any-to-any (wildcard) traffic selectors, and let routing/forwarding tables direct traffic to different IPsec tunnels. It's typically built on router platforms where each IPsec tunnel is modeled as a network interface or VTI (virtual tunnel interface). The following diagrams highlight the two models: The following diagrams highlight the two models: ![route-based](./media/vpn-gateway-connect-multiple-policybased-rm-ps/routebasedmultisite.png) ### Azure support for policy-based VPN -Currently, Azure supports both modes of VPN gateways: route-based VPN gateways and policy-based VPN gateways. They are built on different internal platforms, which result in different specifications: +Currently, Azure supports both modes of VPN gateways: route-based VPN gateways and policy-based VPN gateways. They're built on different internal platforms, which result in different specifications. For more information about gateways, throughput,and connections, see [About VPN Gateway settings](vpn-gateway-about-vpn-gateway-settings.md). -\| Category \| Policy-based VPN Gateway \| Route-based VPN Gateway \| Route-based VPN Gateway \| Route-based VPN Gateway -\| -- \| -- \| - \| - \| -- \| -\| Azure Gateway SKU \| Basic \| Basic \| VpnGw1, VpnGw2, VpnGw3, VpnGw1AZ, VpnGw2AZ, VpnGw3AZ \| VpnGw4, VpnGw5, VpnGw4AZ, VpnGw5AZ \| -\| IKE version \| IKEv1 \| IKEv2 \| IKEv1 and IKEv2 \| IKEv1 and IKEv2 \| -\| Max. S2S connections \| 1 \| 10 \| 30 \| 100 \| -\| \| \| \| \| \| Previously, when working with policy-based VPNs, you were limited to using the policy-based VPN gateway Basic SKU and could only connect to 1 on-premises VPN/firewall device. Now, using custom IPsec/IKE policy, you can use a route-based VPN gateway and connect to multiple policy-based VPN/firewall devices. To make a policy-based VPN connection using a route-based VPN gateway, configure the route-based VPN gateway to use prefix-based traffic selectors with the option "PolicyBasedTrafficSelectors". The following diagram shows why transit routing via Azure VPN gateway doesn't wo ![policy-based transit](./media/vpn-gateway-connect-multiple-policybased-rm-ps/policybasedtransit.png) -As shown in the diagram, the Azure VPN gateway has traffic selectors from the virtual network to each of the on-premises network prefixes, but not the cross-connection prefixes. For example, on-premises site 2, site 3, and site 4 can each communicate to VNet1 respectively, but cannot connect via the Azure VPN gateway to each other. The diagram shows the cross-connect traffic selectors that are not available in the Azure VPN gateway under this configuration. +As shown in the diagram, the Azure VPN gateway has traffic selectors from the virtual network to each of the on-premises network prefixes, but not the cross-connection prefixes. For example, on-premises site 2, site 3, and site 4 can each communicate to VNet1 respectively, but can't connect via the Azure VPN gateway to each other. The diagram shows the cross-connect traffic selectors that aren't available in the Azure VPN gateway under this configuration. ## <a name="workflow"></a>Workflow This section shows you how to enable policy-based traffic selectors on a connect #### Connect to your subscription and declare your variables -1. If you are running PowerShell locally on your computer, sign in using the Connect-AzAccount cmdlet. Or, instead, use Azure Cloud Shell in your browser. +1. If you're running PowerShell locally on your computer, sign in using the Connect-AzAccount cmdlet. Or, instead, use Azure Cloud Shell in your browser. 2. Declare your variables. For this exercise, we use the following variables: The following line shows whether the policy-based traffic selectors are used for ## Next steps Once your connection is complete, you can add virtual machines to your virtual networks. See [Create a Virtual Machine](../virtual-machines/windows/quick-create-portal.md) for steps. -Also review [Configure IPsec/IKE policy for S2S VPN or VNet-to-VNet connections](vpn-gateway-ipsecikepolicy-rm-powershell.md) for more details on custom IPsec/IKE policies. +Also review [Configure IPsec/IKE policy for S2S VPN or VNet-to-VNet connections](vpn-gateway-ipsecikepolicy-rm-powershell.md) for more details on custom IPsec/IKE policies.
vpn-gateway	Vpn Gateway Vpn Faq	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/vpn-gateway/vpn-gateway-vpn-faq.md	description: Learn about frequently asked questions for VPN Gateway cross-premis Previously updated : 09/15/2023 Last updated : 10/06/2023 No. A VPN gateway is a type of virtual network gateway. A VPN gateway sends encrypted traffic between your virtual network and your on-premises location across a public connection. You can also use a VPN gateway to send traffic between virtual networks. When you create a VPN gateway, you use the -GatewayType value 'Vpn'. For more information, see [About VPN Gateway configuration settings](vpn-gateway-about-vpn-gateway-settings.md). -### What is a policy-based (static-routing) gateway? +### Why can't I specify policy-based and route-based VPN types? -Policy-based gateways implement policy-based VPNs. Policy-based VPNs encrypt and direct packets through IPsec tunnels based on the combinations of address prefixes between your on-premises network and the Azure VNet. The policy (or Traffic Selector) is usually defined as an access list in the VPN configuration. +As of Oct 1, 2023, you no longer need to specify VPN type. All new VPN gateways will automatically be created as route-based gateways. If you already have a policy-based gateway, you don't need to upgrade your gateway to route-based. -### What is a route-based (dynamic-routing) gateway? +Previously, the older gateway SKUs didn't support IKEv1 for route-based gateways. Now, most of the current gateway SKUs support both IKEv1 and IKEv2. -Route-based gateways implement the route-based VPNs. Route-based VPNs use "routes" in the IP forwarding or routing table to direct packets into their corresponding tunnel interfaces. The tunnel interfaces then encrypt or decrypt the packets in and out of the tunnels. The policy or traffic selectors for route-based VPNs are configured as any-to-any (or wild cards). - -### Can I specify my own policy-based traffic selectors? - -Yes, traffic selectors can be defined via the trafficSelectorPolicies attribute on a connection via the [New-AzIpsecTrafficSelectorPolicy](/powershell/module/az.network/new-azipsectrafficselectorpolicy) PowerShell command. For the specified traffic selector to take effect, ensure the [Use Policy Based Traffic Selectors](vpn-gateway-connect-multiple-policybased-rm-ps.md#enablepolicybased) option is enabled. - -The custom configured traffic selectors will be proposed only when an Azure VPN gateway initiates the connection. A VPN gateway will accept any traffic selectors proposed by a remote gateway (on-premises VPN device). This behavior is consistent between all connection modes (Default, InitiatorOnly, and ResponderOnly). ### Can I update my policy-based VPN gateway to route-based? No. A gateway type can't be changed from policy-based to route-based, or from route-based to policy-based. To change a gateway type, the gateway must be deleted and recreated. This process takes about 60 minutes. When you create the new gateway, you can't retain the IP address of the original gateway. 1. Delete any connections associated with the gateway.- 1. Delete the gateway using one of the following articles: * [Azure portal](vpn-gateway-delete-vnet-gateway-portal.md) No. A gateway type can't be changed from policy-based to route-based, or from ro * [Azure PowerShell - classic](vpn-gateway-delete-vnet-gateway-classic-powershell.md) 1. Create a new gateway using the gateway type that you want, and then complete the VPN setup. For steps, see the [Site-to-site tutorial](./tutorial-site-to-site-portal.md#VNetGateway). + +### Can I specify my own policy-based traffic selectors? + +Yes, traffic selectors can be defined via the trafficSelectorPolicies attribute on a connection via the [New-AzIpsecTrafficSelectorPolicy](/powershell/module/az.network/new-azipsectrafficselectorpolicy) PowerShell command. For the specified traffic selector to take effect, ensure the [Use Policy Based Traffic Selectors](vpn-gateway-connect-multiple-policybased-rm-ps.md#enablepolicybased) option is enabled. + +The custom configured traffic selectors will be proposed only when an Azure VPN gateway initiates the connection. A VPN gateway will accept any traffic selectors proposed by a remote gateway (on-premises VPN device). This behavior is consistent between all connection modes (Default, InitiatorOnly, and ResponderOnly). ++ ### Do I need a 'GatewaySubnet'? Yes. The gateway subnet contains the IP addresses that the virtual network gateway services use. You need to create a gateway subnet for your VNet in order to configure a virtual network gateway. All gateway subnets must be named 'GatewaySubnet' to work properly. Don't name your gateway subnet something else. And don't deploy VMs or anything else to the gateway subnet.
web-application-firewall	Application Gateway Waf Request Size Limits	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/web-application-firewall/ag/application-gateway-waf-request-size-limits.md	description: This article provides information on Web Application Firewall reque Previously updated : 10/05/2023 Last updated : 10/06/2023 For CRS 3.2 (on the WAF_v2 SKU) and newer, these limits are as follows when usin Only requests with Content-Type of multipart/form-data are considered for file uploads. For content to be considered as a file upload, it has to be a part of a multipart form with a filename header. For all other content types, the request body size limit applies. -To set request size limits in the Azure portal, configure Global parameters in the WAF policy resource's Policy settings page: - +To set request size limits in the Azure portal, configure Global parameters in the WAF policy resource's Policy settings page. ## Request body inspection