Docs update tracker

Updates from: 08/09/2024 01:05:45

Service	Microsoft Docs article	Related commit history on GitHub	Change details
ai-services	Copy Move Projects	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/custom-vision-service/copy-move-projects.md	After you've created and trained a Custom Vision project, you may want to copy your project to another resource. If your app or business depends on a Custom Vision project, we recommend you copy your model to another Custom Vision account in another region. Then if a regional outage occurs, you can access your project in the region where it was copied. -The [ExportProject](/rest/api/customvision/training/projects/export?view=rest-customvision-training-v3.3&tabs=HTTP) and [ImportProject](/rest/api/customvision/training/projects/import?view=rest-customvision-training-v3.3&tabs=HTTP) APIs enable this scenario by allowing you to copy projects from one Custom Vision account into others. This guide shows you how to use these REST APIs with cURL. You can also use an HTTP request service, like the [REST Client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for Visual Studio Code, to issue the requests. +The [ExportProject](/rest/api/customvision/projects/export) and [ImportProject](/rest/api/customvision/projects/import) APIs enable this scenario by allowing you to copy projects from one Custom Vision account into others. This guide shows you how to use these REST APIs with cURL. You can also use an HTTP request service, like the [REST Client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for Visual Studio Code, to issue the requests. > [!TIP] > For an example of this scenario using the Python client library, see the [Move Custom Vision Project](https://github.com/Azure-Samples/custom-vision-move-project/tree/master/) repository on GitHub. The process for copying a project consists of the following steps: ## Get the project ID -First call [GetProjects](/rest/api/customvision/training/projects/get?view=rest-customvision-training-v3.3&tabs=HTTP) to see a list of your existing Custom Vision projects and their IDs. Use the training key and endpoint of your source account. +First call [GetProjects](/rest/api/customvision/projects/get) to see a list of your existing Custom Vision projects and their IDs. Use the training key and endpoint of your source account. ```curl curl -v -X GET "{endpoint}/customvision/v3.3/Training/projects" You'll get a `200\OK` response with a list of projects and their metadata in the ## Export the project -Call [ExportProject](/rest/api/customvision/training/projects/export?view=rest-customvision-training-v3.3&tabs=HTTP) using the project ID and your source training key and endpoint. +Call [ExportProject](/rest/api/customvision/projects/export) using the project ID and your source training key and endpoint. ```curl curl -v -X GET "{endpoint}/customvision/v3.3/Training/projects/{projectId}/export" You'll get a `200/OK` response with metadata about the exported project and a re ## Import the project -Call [ImportProject](/rest/api/customvision/training/projects/import?view=rest-customvision-training-v3.3&tabs=HTTP) using your target training key and endpoint, along with the reference token. You can also give your project a name in its new account. +Call [ImportProject](/rest/api/customvision/projects/import) using your target training key and endpoint, along with the reference token. You can also give your project a name in its new account. ```curl curl -v -G -X POST "{endpoint}/customvision/v3.3/Training/projects/import"
ai-services	Export Delete Data	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/custom-vision-service/export-delete-data.md	# View or delete user data in Custom Vision -Custom Vision collects user data to operate the service, but customers have full control to viewing and delete their data using the Custom Vision [Training APIs](https://go.microsoft.com/fwlink/?linkid=865446). +Custom Vision collects user data to operate the service, but customers have full control to viewing and delete their data using the Custom Vision [Training APIs](/rest/api/customvision/train-project). [!INCLUDE [GDPR-related guidance](~/reusable-content/ce-skilling/azure/includes/gdpr-intro-sentence.md)] To learn how to view or delete different kinds of user data in Custom Vision, se \| Data \| View operation \| Delete operation \| \| - \| - \| - \| -\| Account info (Keys) \| [GetAccountInfo](https://go.microsoft.com/fwlink/?linkid=865446) \| Delete using Azure portal (for Azure Subscriptions). Or use Delete Your Account button in [CustomVision.ai](https://customvision.ai) settings page (for Microsoft Account Subscriptions) \| -\| Iteration details \| [GetIteration](https://go.microsoft.com/fwlink/?linkid=865446) \| [DeleteIteration](https://go.microsoft.com/fwlink/?linkid=865446) \| -\| Iteration performance details \| [GetIterationPerformance](https://go.microsoft.com/fwlink/?linkid=865446) \| [DeleteIteration](https://go.microsoft.com/fwlink/?linkid=865446) \| -\| List of iterations \| [GetIterations](https://go.microsoft.com/fwlink/?linkid=865446) \| [DeleteIteration](https://go.microsoft.com/fwlink/?linkid=865446) \| -\| Projects and project details \| [GetProject](https://go.microsoft.com/fwlink/?linkid=865446) and [GetProjects](https://go.microsoft.com/fwlink/?linkid=865446) \| [DeleteProject](https://go.microsoft.com/fwlink/?linkid=865446) \| -\| Image tags \| [GetTag](https://go.microsoft.com/fwlink/?linkid=865446) and [GetTags](https://go.microsoft.com/fwlink/?linkid=865446) \| [DeleteTag](https://go.microsoft.com/fwlink/?linkid=865446) \| -\| Images \| [GetTaggedImages](https://go.microsoft.com/fwlink/?linkid=865446) (provides uri for image download) and [GetUntaggedImages](https://go.microsoft.com/fwlink/?linkid=865446) (provides uri for image download) \| [DeleteImages](https://go.microsoft.com/fwlink/?linkid=865446) \| -\| Exported iterations \| [GetExports](https://go.microsoft.com/fwlink/?linkid=865446) \| Deleted upon account deletion \| +\| Account info (Keys) \| [GetAccountInfo](/rest/api/aiservices/accountmanagement/accounts/get) \| Delete using Azure portal (for Azure Subscriptions). Or use Delete Your Account button in [CustomVision.ai](https://customvision.ai) settings page (for Microsoft Account Subscriptions) \| +\| Iteration details \| [GetIteration](/rest/api/customvision/get-iteration) \| [DeleteIteration](/rest/api/customvision/delete-iteration) \| +\| Iteration performance details \| [GetIterationPerformance](/rest/api/customvision/get-iteration-performance) \| [DeleteIteration](/rest/api/customvision/delete-iteration) \| +\| List of iterations \| [GetIterations](/rest/api/customvision/get-iterations) \| [DeleteIteration](/rest/api/customvision/delete-iteration) \| +\| Projects and project details \| [GetProject](/rest/api/customvision/get-project) and [GetProjects](/rest/api/customvision/get-projects) \| [DeleteProject](/rest/api/customvision/delete-project) \| +\| Image tags \| [GetTag](/rest/api/customvision/get-tag) and [GetTags](/rest/api/customvision/get-tags) \| [DeleteTag](/rest/api/customvision/delete-tag) \| +\| Images \| [GetTaggedImages](/rest/api/customvision/get-tagged-images) (provides uri for image download) and [GetUntaggedImages](/rest/api/customvision/get-untagged-images) (provides uri for image download) \| [DeleteImages](/rest/api/customvision/delete-images) \| +\| Exported iterations \| [GetExports](/rest/api/customvision/get-exports) \| Deleted upon account deletion \|
ai-services	Limits And Quotas	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/custom-vision-service/limits-and-quotas.md	There are two tiers of keys for the Custom Vision service. You can sign up for a \|How long prediction images stored\|30 days\|30 days\| \|[Prediction](/rest/api/customvision/predictions) operations with storage (Transactions Per Second)\|2\|10\| \|[Prediction](/rest/api/customvision/predictions) operations without storage (Transactions Per Second)\|2\|20\| -\|[TrainProject](https://go.microsoft.com/fwlink/?linkid=865446) (API calls Per Second)\|2\|10\| -\|[Other API calls](https://go.microsoft.com/fwlink/?linkid=865446) (Transactions Per Second)\|10\|10\| +\|[TrainProject](/rest/api/customvision/train-project/train-project) (API calls Per Second)\|2\|10\| +\|[Other API calls](/rest/api/custom-vision) (Transactions Per Second)\|10\|10\| \|Accepted image types\|jpg, png, bmp, gif\|jpg, png, bmp, gif\| \|Min image height/width in pixels\|256 (see note)\|256 (see note)\| \|Max image height/width in pixels\|10,240\|10,240\|
ai-services	Select Domain	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/custom-vision-service/select-domain.md	This guide shows you how to select a domain for your project in the Custom Vision Service. -From the settings tab of your project on the Custom Vision web portal, you can select a model domain for your project. You'll want to choose the domain that's closest to your use case scenario. If you're accessing Custom Vision through a client library or REST API, you'll need to specify a domain ID when creating the project. You can get a list of domain IDs with [Get Domains](/rest/api/customvision/training/domains/list?view=rest-customvision-training-v3.3&tabs=HTTP). Or, use the table below. +From the settings tab of your project on the Custom Vision web portal, you can select a model domain for your project. You'll want to choose the domain that's closest to your use case scenario. If you're accessing Custom Vision through a client library or REST API, you'll need to specify a domain ID when creating the project. You can get a list of domain IDs with [Get Domains](/rest/api/customvision/get-domains). Or, use the table below. ## Image Classification domains
ai-services	Storage Integration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/custom-vision-service/storage-integration.md	Now that you have the integration URLs, you can create a new Custom Vision proje #### [Create a new project](#tab/create) -When you call the [CreateProject](/rest/api/customvision/training/projects/create?view=rest-customvision-training-v3.3&tabs=HTTP) API, add the optional parameters _exportModelContainerUri_ and _notificationQueueUri_. Assign the URL values you got in the previous section. +When you call the [CreateProject](/rest/api/customvision/create-project) API, add the optional parameters _exportModelContainerUri_ and _notificationQueueUri_. Assign the URL values you got in the previous section. ```curl curl -v -X POST "{endpoint}/customvision/v3.3/Training/projects?exportModelContainerUri={inputUri}&notificationQueueUri={inputUri}&name={inputName}" If you receive a `200/OK` response, that means the URLs have been set up success #### [Update an existing project](#tab/update) -To update an existing project with Azure storage feature integration, call the [UpdateProject](/rest/api/customvision/training/projects/update?view=rest-customvision-training-v3.3&tabs=HTTP) API, using the ID of the project you want to update. +To update an existing project with Azure storage feature integration, call the [UpdateProject](/rest/api/customvision/update-project) API, using the ID of the project you want to update. ```curl curl -v -X PATCH "{endpoint}/customvision/v3.3/Training/projects/{projectId}" In your notification queue, you should see a test notification in the following ## Get event notifications -When you're ready, call the [TrainProject](/rest/api/customvision/training/projects/train?view=rest-customvision-training-v3.3&tabs=HTTP) API on your project to do an ordinary training operation. +When you're ready, call the [TrainProject](/rest/api/customvision/train-project) API on your project to do an ordinary training operation. In your Storage notification queue, you'll receive a notification once training finishes: The `"trainingStatus"` field may be either `"TrainingCompleted"` or `"TrainingFa ## Get model export backups -When you're ready, call the [ExportIteration](/rest/api/customvision/training/iterations/export?view=rest-customvision-training-v3.3&tabs=HTTP) API to export a trained model into a specified platform. +When you're ready, call the [ExportIteration](/rest/api/customvision/export-iteration) API to export a trained model into a specified platform. In your designated storage container, a backup copy of the exported model will appear. The blob name will have the format: The `"exportStatus"` field may be either `"ExportCompleted"` or `"ExportFailed"` ## Next steps In this guide, you learned how to copy and back up a project between Custom Vision resources. Next, explore the API reference docs to see what else you can do with Custom Vision. -* [REST API reference documentation (training)](/rest/api/customvision/training/operation-groups?view=rest-customvision-training-v3.3) +* [REST API reference documentation (training)](/rest/api/customvision/train-project) * [REST API reference documentation (prediction)](/rest/api/customvision/predictions)
ai-services	Concept Read	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/document-intelligence/concept-read.md	description: Extract print and handwritten text from scanned and digital documen - - - ignite-2023 Last updated 08/07/2024 The searchable PDF capability enables you to convert an analog PDF, such as scan > [!IMPORTANT] > > * Currently, the searchable PDF capability is only supported by Read OCR model `prebuilt-read`. When using this feature, please specify the `modelId` as `prebuilt-read`, as other model types will return error for this preview version. - > * Searchable PDF is included with the 2024-07-31-preview `prebuilt-read` model with no usage cost for general PDF consumption. + > * Searchable PDF is included with the 2024-07-31-preview `prebuilt-read` model with no additional cost for generating a searchable PDF output. ### Use searchable PDF POST /documentModels/prebuilt-read:analyze?output=pdf 202 ``` -Once the `Analyze` operation is complete, make a `GET` request to retrieve the `Analyze` operation results. +Poll for completion of the `Analyze` operation. Once the operation is complete, issue a `GET` request to retrieve the PDF format of the `Analyze` operation results. Upon successful completion, the PDF can be retrieved and downloaded as `application/pdf`. This operation allows direct downloading of the embedded text form of PDF instead of Base64-encoded JSON.
ai-services	Whats New	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/document-intelligence/whats-new.md	The Document Intelligence [2024-07-31-preview](/rest/api/aiservices/document * West Europe * North Central US -* [Read model](concept-read.md) now supports [PDF output](concept-read.md#searchable-pdf) to download PDFs with embedded text from extraction results, allowing for PDF to be utilized in scenarios such as search and large language model ingestion. -* [Layout model](concept-layout.md) now supports improved [figure detection](concept-layout.md#figures) where figures from documents can now be downloaded as an image file to be used for further figure understanding. -* [Custom extraction models](concept-custom.md#custom-extraction-models) - * Custom extraction models now support updating the model in-place. -* [≡ƒåò Custom generative (Document field extraction) model](concept-custom-generative.md) - * Document Intelligence now offers new custom generative model that utilizes generative AI to extract fields from unstructured documents or structured forms with a wide variety of visual templates. +* [≡ƒåò Document field extraction (custom generative) model](concept-custom-generative.md) + * Use Generative AI to extract fields from documents and forms. Document Intelligence now offers a new document field extraction model that utilizes large language models (LLMs) to extract fields from unstructured documents or structured forms with a wide variety of visual templates. With grounded values and confidence scores, the new Generative AI based extraction fits into your existing processes. +* [≡ƒåò Model compose with custom classifiers](concept-composed-models.md) + * Document Intelligence now adds support for composing model with an explicit custom classification model. [Learn more about the benefits](concept-composed-models.md) of using the new compose capability. * [Custom classification model](concept-custom.md#custom-classification-model) * Custom classification model now supports updating the model in-place as well. * Custom classification model adds support for model copy operation to enable backup and disaster recovery. The Document Intelligence [2024-07-31-preview](/rest/api/aiservices/document * New prebuilt to extract account information including beginning and ending balances, transaction details from bank statements.ΓÇï * [≡ƒåò US Tax model](concept-tax-document.md) * New unified US tax model that can extract from forms such as W-2, 1098, 1099, and 1040. +* ≡ƒåò Searchable PDF. The [prebuilt read](concept-read.md) model now supports [PDF output](concept-read.md#searchable-pdf) to download PDFs with embedded text from extraction results, allowing for PDF to be utilized in scenarios such as search copy of contents. +* [Layout model](concept-layout.md) now supports improved [figure detection](concept-layout.md#figures) where figures from documents can now be downloaded as an image file to be used for further figure understanding. The layout model also features improvements to the OCR model for scanned text targeting improvements for single characters, boxed text, and dense text documents. +* [≡ƒåò Batch API](concept-batch-analysis.md) + * Document Intelligence now adds support for batch analysis operation to support analyzing a set of documents to simplify developer experience and improve efficiency. * [Add-on capabilities](concept-add-on-capabilities.md) * [Query fields](concept-add-on-capabilities.md#query-fields) AI quality of extraction is improved with the latest model. -* [≡ƒåò Batch API](concept-batch-analysis.md) - * Document Intelligence now adds support for batch analysis operation to support analyzing a set of documents to simplify developer experience and improve service efficiency. -* [≡ƒåò Model compose with custom classifiers](concept-composed-models.md) - * Document Intelligence now adds support for composing model with an explicit custom classification model. ++ ## May 2024
ai-services	Model Retirements	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/openai/concepts/model-retirements.md	description: Learn about the model deprecations and retirements in Azure OpenAI. Previously updated : 08/07/2024 Last updated : 08/08/2024 These models are currently available for use in Azure OpenAI Service. \| Model \| Version \| Retirement date \| \| - \| - \| - \| \| `gpt-35-turbo` \| 0301 \| No earlier than October 1, 2024 \| -\| `gpt-35-turbo`<br>`gpt-35-turbo-16k` \| 0613 \| October 1, 2024 \| +\| `gpt-35-turbo`<br>`gpt-35-turbo-16k` \| 0613 \| November 1, 2024 \| \| `gpt-35-turbo` \| 1106 \| No earlier than Nov 17, 2024 \| \| `gpt-35-turbo` \| 0125 \| No earlier than Feb 22, 2025 \| \| `gpt-4`<br>`gpt-4-32k` \| 0314 \| Deprecation: October 1, 2024 <br> Retirement: June 6, 2025 \| If you're an existing customer looking for information about these models, see [ ## Retirement and deprecation history +### August 8, 2024 + +* Updated `gpt-35-turbo` & `gpt-35-turbo-16k` (0613) model's retirement date to November 1, 2024. + ### July 30, 2024 * Updated `gpt-4` preview model upgrade date to November 15, 2024 or later for the following versions:
ai-services	Quotas Limits	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/openai/quotas-limits.md	- ignite-2023 - references_regions Previously updated : 08/06/2024 Last updated : 08/08/2024 Global Standard deployments use Azure's global infrastructure, dynamically routi The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customerΓÇÖs usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant. +> [!NOTE] +> Usage tiers only apply to standard and global standard deployment types. Usage tiers do not apply to global batch deployments. + #### GPT-4o global standard & standard \|Model\| Usage Tiers per month \| \|-\|-\| -\|`gpt-4o` \|1.5 Billion tokens \| +\|`gpt-4o` \| 8 Billion tokens \| \|`gpt-4o-mini` \| 45 Billion tokens \| +#### GPT-4 standard + +\|Model\| Usage Tiers per month\| +\|\|\| +\| `gpt-4` + `gpt-4-32k` (all versions) \| 4 Billion \| ++ ## Other offer types If your Azure subscription is linked to certain [offer types](https://azure.microsoft.com/support/legal/offer-details/) your max quota values are lower than the values indicated in the above tables.
ai-services	Speech Container Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-services/speech-service/speech-container-overview.md	Previously updated : 1/22/2024 Last updated : 8/7/2024 keywords: on-premises, Docker, container The following table lists the Speech containers available in the Microsoft Conta \| Container \| Features \| Supported versions and locales \| \|--\|--\|--\| -\| [Speech to text](speech-container-stt.md) \| Transcribes continuous real-time speech or batch audio recordings with intermediate results. \| Latest: 4.7.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list).\| -\| [Custom speech to text](speech-container-cstt.md) \| Using a custom model from the [custom speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. \| Latest: 4.7.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/custom-speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list). \| -\| [Speech language identification](speech-container-lid.md)<sup>1, 2</sup> \| Detects the language spoken in audio files. \| Latest: 1.13.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/language-detection/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/language-detection/tags/list). \| +\| [Speech to text](speech-container-stt.md) \| Transcribes continuous real-time speech or batch audio recordings with intermediate results. \| Latest: 4.8.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list).\| +\| [Custom speech to text](speech-container-cstt.md) \| Using a custom model from the [custom speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. \| Latest: 4.8.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/custom-speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list). \| +\| [Speech language identification](speech-container-lid.md)<sup>1, 2</sup> \| Detects the language spoken in audio files. \| Latest: 1.14.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/language-detection/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/language-detection/tags/list). \| \| [Neural text to speech](speech-container-ntts.md) \| Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. \| Latest: 3.3.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/neural-text-to-speech/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/neural-text-to-speech/tags/list). \| <sup>1</sup> The container is available in public preview. Containers in preview are still under development and don't meet Microsoft's stability and support requirements.
ai-studio	Configure Managed Network	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/configure-managed-network.md	You need to configure following network isolation configurations. - Create private endpoint outbound rules to your private Azure resources. Private Azure AI Search isn't supported yet. - If you use Visual Studio Code integration with allow only approved outbound mode, create FQDN outbound rules described in the [use Visual Studio Code](#scenario-use-visual-studio-code) section. - If you use HuggingFace models in Models with allow only approved outbound mode, create FQDN outbound rules described in the [use HuggingFace models](#scenario-use-huggingface-models) section. +- If you use one of the open-source models with allow only approved outbound mode, create FQDN outbound rules described in the [curated by Azure AI](#scenario-curated-by-azure-ai) section. ## Network isolation architecture and isolation modes If you plan to use __HuggingFace models__ with the hub, add outbound _FQDN_ rule * cnd.auth0.com * cdn-lfs.huggingface.co +### Scenario: Curated by Azure AI + +These models involve dynamic installation of dependencies at runtime, and reequire outbound _FQDN_ rules to allow traffic to the following hosts: + +.anaconda.org +.anaconda.com +anaconda.com +pypi.org +.pythonhosted.org +.pytorch.org +pytorch.org + ## Private endpoints Private endpoints are currently supported for the following Azure
ai-studio	Deploy Jais Models	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-jais-models.md	- Title: How to deploy JAIS models with Azure AI Studio- -description: Learn how to deploy JAIS models with Azure AI Studio. --- Previously updated : 5/21/2024------ -# How to deploy JAIS with Azure AI Studio -- -In this article, you learn how to use Azure AI Studio to deploy the JAIS model as serverless APIs with pay-as-you-go token-based billing. - -The JAIS model is available in [Azure AI Studio](https://ai.azure.com) with pay-as-you-go token based billing with Models as a Service. - -You can find the JAIS model in the [Model Catalog](model-catalog.md) by filtering on the JAIS collection. - -### Prerequisites --- An Azure subscription with a valid payment method. Free or trial Azure subscriptions will not work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [Azure AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for JAIS is only available with hubs created in these regions:- - * East US - * East US 2 - * North Central US - * South Central US - * West US - * West US 3 - * Sweden Central - - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). -- An [AI Studio project](../how-to/create-projects.md) in Azure AI Studio.-- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md).-- -### JAIS 30b Chat - -JAIS 30b Chat is an auto-regressive bi-lingual LLM for Arabic & English. The tuned versions use supervised fine-tuning (SFT). The model is fine-tuned with both Arabic and English prompt-response pairs. The fine-tuning datasets included a wide range of instructional data across various domains. The model covers a wide range of common tasks including question answering, code generation, and reasoning over textual content. To enhance performance in Arabic, the Core42 team developed an in-house Arabic dataset as well as translating some open-source English instructions into Arabic. - -Context length: JAIS supports a context length of 8K. - -Input: Model input is text only. - -Output: Model generates text only. - -## Deploy as a serverless API -- -Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription. - -The previously mentioned JAIS 30b Chat model can be deployed as a service with pay-as-you-go billing and is offered by Core42 through the Microsoft Azure Marketplace. Core42 can change or update the terms of use and pricing of this model. -- -### Create a new deployment - -To create a deployment: - -1. Sign in to [Azure AI Studio](https://ai.azure.com). -1. Select Model catalog from the left sidebar. -1. Search for JAIS and select the model _Jais-30b-chat_. - - :::image type="content" source="../media/deploy-monitor/jais/jais-search.png" alt-text="A screenshot showing a model in the model catalog." lightbox="../media/deploy-monitor/jais/jais-search.png"::: - -2. Select Deploy to open a serverless API deployment window for the model. - - :::image type="content" source="../media/deploy-monitor/jais/jais-deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the pay-as-you-go option." lightbox="../media/deploy-monitor/jais/jais-deploy-pay-as-you-go.png"::: - -1. Select the project in which you want to deploy your model. To deploy the model your project must be in the East US 2 or Sweden Central region. -1. In the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. -1. Select the Pricing and terms tab to learn about pricing for the selected model. -1. Select the Subscribe and Deploy button. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering. This step requires that your account has the Azure AI Developer role permissions on the resource group, as listed in the prerequisites. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. Currently, you can have only one deployment for each model within a project. - - :::image type="content" source="../media/deploy-monitor/jais/jais-marketplace-terms.png" alt-text="A screenshot showing the terms and conditions of a given model." lightbox="../media/deploy-monitor/jais/jais-marketplace-terms.png"::: - -1. Once you subscribe the project for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ project don't require subscribing again. If this scenario applies to you, there's a Continue to deploy option to select. - - :::image type="content" source="../media/deploy-monitor/jais/jais-existing-subscription.png" alt-text="A screenshot showing a project that is already subscribed to the offering." lightbox="../media/deploy-monitor/jais/jais-existing-subscription.png"::: - -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. - - :::image type="content" source="../media/deploy-monitor/jais/jais-deployment-name.png" alt-text="A screenshot showing how to indicate the name of the deployment you want to create." lightbox="../media/deploy-monitor/jais/jais-deployment-name.png"::: - -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. -1. Select Open in playground to start interacting with the model. -1. You can return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key. For more information on using the APIs, see the [reference](#chat-api-reference-for-jais-deployed-as-a-service) section. -1. You can always find the endpoint's details, URL, and access keys by navigating to your Project overview page. Then, from the left sidebar of your project, select Components > Deployments. - -To learn about billing for the JAIS models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for JAIS models deployed as a service](#cost-and-quota-considerations-for-models-deployed-as-a-service) - -### Consume the JAIS 30b Chat model as a service - -These models can be consumed using the chat API. - -1. From your Project overview page, go to the left sidebar and select Components > Deployments. - -1. Find and select the deployment you created. - -1. Copy the Target URL and the Key value. - -For more information on using the APIs, see the [reference](#chat-api-reference-for-jais-deployed-as-a-service) section. - -## Chat API reference for JAIS deployed as a service - -### v1/chat/completions - -#### Request -``` - POST /v1/chat/completions HTTP/1.1 - Host: <DEPLOYMENT_URI> - Authorization: Bearer <TOKEN> - Content-type: application/json -``` - -#### v1/chat/completions request schema - -JAIS 30b Chat accepts the following parameters for a `v1/chat/completions` response inference call: - -\| Property \| Type \| Default \| Description \| -\| \| \| \| \| -\| `messages` \| `array` \| `None` \| Text input for the model to respond to. \| -\| `max_tokens` \| `integer` \| `None` \| The maximum number of tokens the model generates as part of the response. Note: Setting a low value might result in incomplete generations. If not specified, generates tokens until end of sequence. \| -\| `temperature` \| `float` \| `0.3` \| Controls randomness in the model. Lower values make the model more deterministic and higher values make the model more random. \| -\| `top_p` \| `float` \|`None`\|The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling, defaults to null.\| -\| `top_k` \| `integer` \|`None`\|The number of highest probability vocabulary tokens to keep for top-k-filtering, defaults to null.\| -- -A System or User Message supports the following properties: - -\| Property \| Type \| Default \| Description \| -\| \| \| \| \| -\| `role` \| `enum` \| Required \| `role=system` or `role=user`. \| -\|`content` \|`string` \|Required \|Text input for the model to respond to. \| -- -An Assistant Message supports the following properties: - -\| Property \| Type \| Default \| Description \| -\| \| \| \| \| -\| `role` \| `enum` \| Required \| `role=assistant`\| -\|`content` \|`string` \|Required \|The contents of the assistant message. \| -- -#### v1/chat/completions response schema - -The response payload is a dictionary with the following fields: - -\| Key \| Type \| Description \| -\| \| \| \| -\| `id` \| `string` \| A unique identifier for the completion. \| -\| `choices` \| `array` \| The list of completion choices the model generated for the input messages. \| -\| `created` \| `integer` \| The Unix timestamp (in seconds) of when the completion was created. \| -\| `model` \| `string` \| The model_id used for completion. \| -\| `object` \| `string` \| chat.completion. \| -\| `usage` \| `object` \| Usage statistics for the completion request. \| - -The `choices` object is a dictionary with the following fields: - -\| Key \| Type \| Description \| -\| \| \| \| -\| `index` \| `integer` \| Choice index. \| -\| `messages` or `delta` \| `string` \| Chat completion result in messages object. When streaming mode is used, delta key is used. \| -\| `finish_reason` \| `string` \| The reason the model stopped generating tokens. \| - -The `usage` object is a dictionary with the following fields: - -\| Key \| Type \| Description \| -\| \| \| \| -\| `prompt_tokens` \| `integer` \| Number of tokens in the prompt. \| -\| `completion_tokens` \| `integer` \| Number of tokens generated in the completion. \| -\| `total_tokens` \| `integer` \| Total tokens. \| -- -#### Examples - -##### Arabic -Request: - -```json - "messages": [ - { - "role": "user", - "content": "┘à╪º ┘ç┘è ╪º┘ä╪ú┘à╪º┘â┘å ╪º┘ä╪┤┘ç┘è╪▒╪⌐ ╪º┘ä╪¬┘è ┘è╪¼╪¿ ╪▓┘è╪º╪▒╪¬┘ç╪º ┘ü┘è ╪º┘ä╪Ñ┘à╪º╪▒╪º╪¬╪ƒ" - } - ] -``` - -Response: - -```json - { - "id": "df23b9f7-e6bd-493f-9437-443c65d428a1", - "choices": [ - { - "index": 0, - "finish_reason": "stop", - "message": { - "role": "assistant", - "content": "┘ç┘å╪º┘â ╪º┘ä╪╣╪»┘è╪» ┘à┘å ╪º┘ä╪ú┘à╪º┘â┘å ╪º┘ä┘à╪░┘ç┘ä╪⌐ ┘ä┘ä╪▓┘è╪º╪▒╪⌐ ┘ü┘è ╪º┘ä╪Ñ┘à╪º╪▒╪º╪¬! ┘ê┘à┘å ╪ú╪┤┘ç╪▒┘ç╪º ╪¿╪▒╪¼ ╪«┘ä┘è┘ü╪⌐ ┘ü┘è ╪»╪¿┘è ┘ê┘ç┘ê ╪ú╪╖┘ê┘ä ┘à╪¿┘å┘ë ┘ü┘è ╪º┘ä╪╣╪º┘ä┘à ╪î ┘ê┘à╪│╪¼╪» ╪º┘ä╪┤┘è╪« ╪▓╪º┘è╪» ╪º┘ä┘â╪¿┘è╪▒ ┘ü┘è ╪ú╪¿┘ê╪╕╪¿┘è ┘ê╪º┘ä╪░┘è ┘è╪╣╪» ╪ú╪¡╪» ╪ú╪¼┘à┘ä ╪º┘ä┘à╪│╪º╪¼╪» ┘ü┘è ╪º┘ä╪╣╪º┘ä┘à ╪î ┘ê╪╡╪¡╪▒╪º╪í ┘ä┘è┘ê╪º ┘ü┘è ╪º┘ä╪╕┘ü╪▒╪⌐ ┘ê╪º┘ä╪¬┘è ╪¬╪╣╪» ╪ú┘â╪¿╪▒ ╪╡╪¡╪▒╪º╪í ╪▒┘à┘ä┘è╪⌐ ┘ü┘è ╪º┘ä╪╣╪º┘ä┘à ┘ê╪¬╪¼╪░╪¿ ╪º┘ä┘â╪½┘è╪▒ ┘à┘å ╪º┘ä╪│┘è╪º╪¡ ┘ä╪¬╪¼╪▒╪¿╪⌐ ╪▒┘â┘ê╪¿ ╪º┘ä╪¼┘à╪º┘ä ┘ê╪º┘ä╪¬╪«┘è┘è┘à ┘ü┘è ╪º┘ä╪╡╪¡╪▒╪º╪í. ┘â┘à╪º ┘è┘à┘â┘å ┘ä┘ä╪▓┘ê╪º╪▒ ╪º┘ä╪º╪│╪¬┘à╪¬╪º╪╣ ╪¿╪º┘ä╪┤┘ê╪º╪╖╪ª ╪º┘ä╪¼┘à┘è┘ä╪⌐ ┘ü┘è ╪»╪¿┘è ┘ê╪ú╪¿┘ê╪╕╪¿┘è ┘ê╪º┘ä╪┤╪º╪▒┘é╪⌐ ┘ê╪▒╪ú╪│ ╪º┘ä╪«┘è┘à╪⌐╪î ┘ê╪▓┘è╪º╪▒╪⌐ ┘à╪¬╪¡┘ü ╪º┘ä┘ä┘ê┘ü╪▒ ╪ú╪¿┘ê╪╕╪¿┘è ┘ä┘ä╪¬╪╣╪▒┘ü ╪╣┘ä┘ë ╪¬╪º╪▒┘è╪« ╪º┘ä┘ü┘å ┘ê╪º┘ä╪½┘é╪º┘ü╪⌐ ╪º┘ä╪╣╪º┘ä┘à┘è╪⌐" - } - } - ], - "created": 1711734274, - "model": "jais-30b-chat", - "object": "chat.completion", - "usage": { - "prompt_tokens": 23, - "completion_tokens": 744, - "total_tokens": 767 - } - } -``` - -##### English -Request: - -```json - "messages": [ - { - "role": "user", - "content": "List the emirates of the UAE." - } - ] -``` - -Response: - -```json - { - "id": "df23b9f7-e6bd-493f-9437-443c65d428a1", - "choices": [ - { - "index": 0, - "finish_reason": "stop", - "message": { - "role": "assistant", - "content": "The seven emirates of the United Arab Emirates are: Abu Dhabi, Dubai, Sharjah, Ajman, Umm Al-Quwain, Fujairah, and Ras Al Khaimah." - } - } - ], - "created": 1711734274, - "model": "jais-30b-chat", - "object": "chat.completion", - "usage": { - "prompt_tokens": 23, - "completion_tokens": 60, - "total_tokens": 83 - } - } -``` - -##### More inference examples - -\| Sample Type \| Sample Notebook \| -\|-\|-\| -\| CLI using CURL and Python web requests \| [webrequests.ipynb](https://aka.ms/jais/webrequests-sample) \| -\| OpenAI SDK (experimental) \| [openaisdk.ipynb](https://aka.ms/jais/openaisdk) \| -\| LiteLLM \| [litellm.ipynb](https://aka.ms/jais/litellm-sample) \| - -## Cost and quotas - -### Cost and quota considerations for models deployed as a service - -JAIS 30b Chat is deployed as a service are offered by Core42 through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. - -Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. - -For more information on how to track costs, see [monitor costs for models offered throughout the Azure Marketplace](../how-to/costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). - -Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. - -## Content filtering - -Models deployed as a service with pay-as-you-go billing are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md). - -## Next steps --- [What is Azure AI Studio?](../what-is-ai-studio.md)-- [Azure AI FAQ article](../faq.yml)-- [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
ai-studio	Deploy Models Cohere Command	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-cohere-command.md	Title: How to deploy Cohere Command models with Azure AI Studio + Title: How to use Cohere Command chat models with Azure AI Studio -description: Learn how to deploy Cohere Command models with Azure AI Studio. - +description: Learn how to use Cohere Command chat models with Azure AI Studio. + Previously updated : 5/21/2024 Last updated : 08/08/2024 +reviewer: shubhirajMsft -+ +zone_pivot_groups: azure-ai-model-catalog-samples-chat ++ +# How to use Cohere Command chat models + +In this article, you learn about Cohere Command chat models and how to use them. +The Cohere family of models includes various models optimized for different use cases, including chat completions, embeddings, and rerank. Cohere models are optimized for various use cases that include reasoning, summarization, and question answering. ++++ +## Cohere Command chat models + +The Cohere Command chat models include the following models: + +# [Cohere Command R+](#tab/cohere-command-r-plus) + +Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. + +We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). ++ +The following models are available: + +* [Cohere-command-r-plus](https://aka.ms/azureai/landing/Cohere-command-r-plus) ++ +# [Cohere Command R](#tab/cohere-command-r) + +Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. + +Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. ++ +The following models are available: + +* [Cohere-command-r](https://aka.ms/azureai/landing/Cohere-command-r) ++++ +> [!TIP] +> Additionally, Cohere supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [Cohere documentation](https://docs.cohere.com/reference/about) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Cohere Command chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Cohere Command chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Cohere Command chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```python +model_info = client.get_model_info() +``` + +The response is as follows: ++ +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: Cohere-command-r-plus +Model type: chat-completions +Model provider name: Cohere +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```python +from azure.ai.inference.models import SystemMessage, UserMessage + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Cohere-command-r-plus +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Cohere Command chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant that always generate responses in JSON format, using." + " the following format: { ""answer"": ""response"" }."), + UserMessage(content="How many languages are in the world?"), + ], + response_format={ "type": ChatCompletionsResponseFormat.JSON_OBJECT } +) +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +### Use tools + +Cohere Command chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```python +from azure.ai.inference.models import FunctionDefinition, ChatCompletionsFunctionToolDefinition + +flight_info = ChatCompletionsFunctionToolDefinition( + function=FunctionDefinition( + name="get_flight_info", + description="Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + parameters={ + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates", + }, + "destination_city": { + "type": "string", + "description": "The flight destination city", + }, + }, + "required": ["origin_city", "destination_city"], + }, + ) +) + +tools = [flight_info] +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. ++ +```python +def get_flight_info(loc_origin: str, loc_destination: str): + return { + "info": f"There are no flights available from {loc_origin} to {loc_destination}. You should take a train, specially if it helps to reduce CO2 emissions." + } +``` + +> [!NOTE] +> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type Tool, ensure the response is a valid JSON string. + +Prompt the model to book flights with the help of this function: ++ +```python +messages = [ + SystemMessage( + content="You are a helpful assistant that help users to find information about traveling, how to get" + " to places and the different transportations options. You care about the environment and you" + " always have that in mind when answering inqueries.", + ), + UserMessage( + content="When is the next flight from Miami to Seattle?", + ), +] + +response = client.complete( + messages=messages, tools=tools, tool_choice="auto" +) +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```python +response_message = response.choices[0].message +tool_calls = response_message.tool_calls + +print("Finish reason:", response.choices[0].finish_reason) +print("Tool call:", tool_calls) +``` + +To continue, append this message to the chat history: ++ +```python +messages.append( + response_message +) +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```python +import json +from azure.ai.inference.models import ToolMessage + +for tool_call in tool_calls: + + # Get the tool details: + + function_name = tool_call.function.name + function_args = json.loads(tool_call.function.arguments.replace("\'", "\"")) + tool_call_id = tool_call.id + + print(f"Calling function `{function_name}` with arguments {function_args}") + + # Call the function defined above using `locals()`, which returns the list of all functions + # available in the scope as a dictionary. Notice that this is just done as a simple way to get + # the function callable from its string name. Then we can call it with the corresponding + # arguments. + + callable_func = locals()[function_name] + function_response = callable_func(*function_args) + + print("->", function_response) + + # Once we have a response from the function and its arguments, we can append a new message to the chat + # history. Notice how we are telling to the model that this chat message came from a tool: + + messages.append( + ToolMessage( + tool_call_id=tool_call_id, + content=json.dumps(function_response) + ) + ) +``` + +View the response from the model: ++ +```python +response = client.complete( + messages=messages, + tools=tools, +) +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```python +from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage + +try: + response = client.complete( + messages=[ + SystemMessage(content="You are an AI assistant that helps people find information."), + UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), + ] + ) + + print(response.choices[0].message.content) + +except HttpResponseError as ex: + if ex.status_code == 400: + response = ex.response.json() + if isinstance(response, dict) and "error" in response: + print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}") + else: + raise + raise +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Cohere Command chat models + +The Cohere Command chat models include the following models: + +# [Cohere Command R+](#tab/cohere-command-r-plus) + +Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. + + Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. + +We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). ++ +The following models are available: + +* [Cohere-command-r-plus](https://aka.ms/azureai/landing/Cohere-command-r-plus) ++ +# [Cohere Command R](#tab/cohere-command-r) + +Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. + +Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. ++ +The following models are available: + +* [Cohere-command-r](https://aka.ms/azureai/landing/Cohere-command-r) ++ -# How to deploy Cohere Command models with Azure AI Studio +> [!TIP] +> Additionally, Cohere supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [Cohere documentation](https://docs.cohere.com/reference/about) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Cohere Command chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Cohere Command chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Cohere Command chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: Cohere-command-r-plus +Model type: chat-completions +Model provider name: Cohere +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Cohere-command-r-plus +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Cohere Command chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant that always generate responses in JSON format, using." + + " the following format: { \"answer\": \"response\" }." }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + response_format: { type: "json_object" } + } +}); +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +### Use tools + +Cohere Command chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```javascript +const flight_info = { + name: "get_flight_info", + description: "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + parameters: { + type: "object", + properties: { + origin_city: { + type: "string", + description: "The name of the city where the flight originates", + }, + destination_city: { + type: "string", + description: "The flight destination city", + }, + }, + required: ["origin_city", "destination_city"], + }, +} + +const tools = [ + { + type: "function", + function: flight_info, + }, +]; +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. ++ +```javascript +function get_flight_info(loc_origin, loc_destination) { + return { + info: "There are no flights available from " + loc_origin + " to " + loc_destination + ". You should take a train, specially if it helps to reduce CO2 emissions." + } +} +``` + +> [!NOTE] +> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type Tool, ensure the response is a valid JSON string. + +Prompt the model to book flights with the help of this function: ++ +```javascript +var result = await client.path("/chat/completions").post({ + body: { + messages: messages, + tools: tools, + tool_choice: "auto" + } +}); +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```javascript +const response_message = response.body.choices[0].message; +const tool_calls = response_message.tool_calls; + +console.log("Finish reason: " + response.body.choices[0].finish_reason); +console.log("Tool call: " + tool_calls); +``` + +To continue, append this message to the chat history: ++ +```javascript +messages.push(response_message); +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```javascript +function applyToolCall({ function: call, id }) { + // Get the tool details: + const tool_params = JSON.parse(call.arguments); + console.log("Calling function " + call.name + " with arguments " + tool_params); + + // Call the function defined above using `window`, which returns the list of all functions + // available in the scope as a dictionary. Notice that this is just done as a simple way to get + // the function callable from its string name. Then we can call it with the corresponding + // arguments. + const function_response = tool_params.map(window[call.name]); + console.log("-> " + function_response); + + return function_response +} + +for (const tool_call of tool_calls) { + var tool_response = tool_call.apply(applyToolCall); + + messages.push( + { + role: "tool", + tool_call_id: tool_call.id, + content: tool_response + } + ); +} +``` + +View the response from the model: ++ +```javascript +var result = await client.path("/chat/completions").post({ + body: { + messages: messages, + tools: tools, + } +}); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```javascript +try { + var messages = [ + { role: "system", content: "You are an AI assistant that helps people find information." }, + { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." }, + ]; + + var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } + }); + + console.log(response.body.choices[0].message.content); +} +catch (error) { + if (error.status_code == 400) { + var response = JSON.parse(error.response._content); + if (response.error) { + console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`); + } + else + { + throw error; + } + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Cohere Command chat models + +The Cohere Command chat models include the following models: + +# [Cohere Command R+](#tab/cohere-command-r-plus) + +Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. + +We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). ++ +The following models are available: + +* [Cohere-command-r-plus](https://aka.ms/azureai/landing/Cohere-command-r-plus) ++ +# [Cohere Command R](#tab/cohere-command-r) + +Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. + +* Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. + +Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. ++ +The following models are available: + +* [Cohere-command-r](https://aka.ms/azureai/landing/Cohere-command-r) ++++ +> [!TIP] +> Additionally, Cohere supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [Cohere documentation](https://docs.cohere.com/reference/about) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Cohere Command chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Cohere Command chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Cohere Command chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: Cohere-command-r-plus +Model type: chat-completions +Model provider name: Cohere +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Cohere-command-r-plus +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. -In this article, you learn how to use Azure AI Studio to deploy the Cohere Command models as serverless APIs with pay-as-you-go token-based billing. -Cohere offers two Command models in [Azure AI Studio](https://ai.azure.com). These models are available as serverless APIs with pay-as-you-go token-based billing. You can browse the Cohere family of models in the [model catalog](model-catalog.md) by filtering on the Cohere collection. +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; -## Cohere Command models + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` -In this section, you learn about the two Cohere Command models that are available in the model catalog: +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. -* Cohere Command R -* Cohere Command R+ +To visualize the output, define an asynchronous method to print the stream in the console. -You can browse the Cohere family of models in the [Model Catalog](model-catalog-overview.md) by filtering on the Cohere collection. +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` -- Model Architecture: Both Command R and Command R+ are auto-regressive language models that use an optimized transformer architecture. After pretraining, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +You can visualize how streaming generates content: -- Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. - Pre-training data additionally included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, Persian. +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` -- Context length: Command R and Command R+ support a context length of 128K. +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` -- Input: Models input text only. +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). -- Output: Models generate text only. +#### Create JSON outputs -## Deploy as a serverless API +Cohere Command chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. -Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription. -The previously mentioned Cohere models can be deployed as a service with pay-as-you-go billing and are offered by Cohere through the Microsoft Azure Marketplace. Cohere can change or update the terms of use and pricing of these models. +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage( + "You are a helpful assistant that always generate responses in JSON format, " + + "using. the following format: { \"answer\": \"response\" }." + ), + new ChatRequestUserMessage( + "How many languages are in the world?" + ) + }, + ResponseFormat = new ChatCompletionsResponseFormatJSON() +}; -### Prerequisites +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` -- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [Azure AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for Cohere Command is only available with hubs created in these regions: +### Pass extra parameters to the model - * East US - * East US 2 - * North Central US - * South Central US - * West US - * West US 3 - * Sweden Central - - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. -- An [AI Studio project](../how-to/create-projects.md) in Azure AI Studio.-- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md). +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. -### Create a new deployment +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; -The following steps demonstrate the deployment of Cohere Command R, but you can use the same steps to deploy Cohere Command R+ by replacing the model name. +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` -To create a deployment: +### Use tools -1. Sign in to [Azure AI Studio](https://ai.azure.com). -1. Select Model catalog from the left sidebar. -1. Search for Cohere. -1. Select Cohere-command-r to open the Model Details page. +Cohere Command chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. - :::image type="content" source="../media/deploy-monitor/cohere-command/command-r-deploy-directly-from-catalog.png" alt-text="A screenshot showing how to access the model details page by going through the model catalog." lightbox="../media/deploy-monitor/cohere-command/command-r-deploy-directly-from-catalog.png"::: +The following code example creates a tool definition that is able to look from flight information from two different cities. -1. Select Deploy to open a serverless API deployment window for the model. -1. Alternatively, you can initiate a deployment by starting from your project in AI Studio. - 1. From the left sidebar of your project, select Components > Deployments. - 1. Select + Create deployment. - 1. Search for and select Cohere-command-r. to open the Model Details page. - - :::image type="content" source="../media/deploy-monitor/cohere-command/command-r-deploy-start-from-project.png" alt-text="A screenshot showing how to access the model details page by going through the Deployments page in your project." lightbox="../media/deploy-monitor/cohere-command/command-r-deploy-start-from-project.png"::: +```csharp +FunctionDefinition flightInfoFunction = new FunctionDefinition("getFlightInfo") +{ + Description = "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + Parameters = BinaryData.FromObjectAsJson(new + { + Type = "object", + Properties = new + { + origin_city = new + { + Type = "string", + Description = "The name of the city where the flight originates" + }, + destination_city = new + { + Type = "string", + Description = "The flight destination city" + } + } + }, + new JsonSerializerOptions() { PropertyNamingPolicy = JsonNamingPolicy.CamelCase } + ) +}; - 1. Select Confirm to open a serverless API deployment window for the model. +ChatCompletionsFunctionToolDefinition getFlightTool = new ChatCompletionsFunctionToolDefinition(flightInfoFunction); +``` - :::image type="content" source="../media/deploy-monitor/cohere-command/command-r-deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model as a serverless API." lightbox="../media/deploy-monitor/cohere-command/command-r-deploy-pay-as-you-go.png"::: +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. -1. Select the project in which you want to deploy your model. To deploy the model your project must be in the EastUS2 or Sweden Central region. -1. In the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. -1. Select the Pricing and terms tab to learn about pricing for the selected model. -1. Select the Subscribe and Deploy button. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering. This step requires that your account has the Azure AI Developer role permissions on the resource group, as listed in the prerequisites. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. Currently, you can have only one deployment for each model within a project. -1. Once you subscribe the project for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ project don't require subscribing again. If this scenario applies to you, there's a Continue to deploy option to select. - :::image type="content" source="../media/deploy-monitor/cohere-command/command-r-existing-subscription.png" alt-text="A screenshot showing a project that is already subscribed to the offering." lightbox="../media/deploy-monitor/cohere-command/command-r-existing-subscription.png"::: +```csharp +static string getFlightInfo(string loc_origin, string loc_destination) +{ + return JsonSerializer.Serialize(new + { + info = $"There are no flights available from {loc_origin} to {loc_destination}. You " + + "should take a train, specially if it helps to reduce CO2 emissions." + }); +} +``` -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. +> [!NOTE] +> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type Tool, ensure the response is a valid JSON string. - :::image type="content" source="../media/deploy-monitor/cohere-command/command-r-deployment-name.png" alt-text="A screenshot showing how to indicate the name of the deployment you want to create." lightbox="../media/deploy-monitor/cohere-command/command-r-deployment-name.png"::: +Prompt the model to book flights with the help of this function: -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. -1. Select Open in playground to start interacting with the model. -1. Return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key. For more information on using the APIs, see the [reference](#reference-for-cohere-models-deployed-as-a-service) section. -1. You can always find the endpoint's details, URL, and access keys by navigating to your Project overview page. Then, from the left sidebar of your project, select Components > Deployments. -To learn about billing for the Cohere models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for models deployed as a serverless API](#cost-and-quota-considerations-for-models-deployed-as-a-serverless-api). +```csharp +var chatHistory = new List<ChatRequestMessage>(){ + new ChatRequestSystemMessage( + "You are a helpful assistant that help users to find information about traveling, " + + "how to get to places and the different transportations options. You care about the" + + "environment and you always have that in mind when answering inqueries." + ), + new ChatRequestUserMessage("When is the next flight from Miami to Seattle?") + }; -### Consume the Cohere models as a service +requestOptions = new ChatCompletionsOptions(chatHistory); +requestOptions.Tools.Add(getFlightTool); +requestOptions.ToolChoice = ChatCompletionsToolChoice.Auto; -These models can be consumed using the chat API. +response = client.Complete(requestOptions); +``` -1. From your Project overview page, go to the left sidebar and select Components > Deployments. +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. -1. Find and select the deployment you created. -1. Copy the Target URL and the Key value. +```csharp +var responseMenssage = response.Value.Choices[0].Message; +var toolsCall = responseMenssage.ToolCalls; -2. Cohere exposes two routes for inference with the Command R and Command R+ models. The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` and the native [Cohere API](#cohere-chat-api). +Console.WriteLine($"Finish reason: {response.Value.Choices[0].FinishReason}"); +Console.WriteLine($"Tool call: {toolsCall[0].Id}"); +``` -For more information on using the APIs, see the [reference](#reference-for-cohere-models-deployed-as-a-service) section. +To continue, append this message to the chat history: -## Reference for Cohere models deployed as a service -Cohere Command R and Command R+ models accept both the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` and the native [Cohere Chat API](#cohere-chat-api) on `/v1/chat`. +```csharp +requestOptions.Messages.Add(new ChatRequestAssistantMessage(response.Value.Choices[0].Message)); +``` -### Azure AI Model Inference API +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. -The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started). -### Cohere Chat API +```csharp +foreach (ChatCompletionsToolCall tool in toolsCall) +{ + if (tool is ChatCompletionsFunctionToolCall functionTool) + { + // Get the tool details: + string callId = functionTool.Id; + string toolName = functionTool.Name; + string toolArgumentsString = functionTool.Arguments; + Dictionary<string, object> toolArguments = JsonSerializer.Deserialize<Dictionary<string, object>>(toolArgumentsString); + + // Here you have to call the function defined. In this particular example we use + // reflection to find the method we definied before in an static class called + // `ChatCompletionsExamples`. Using reflection allows us to call a function + // by string name. Notice that this is just done for demonstration purposes as a + // simple way to get the function callable from its string name. Then we can call + // it with the corresponding arguments. + + var flags = BindingFlags.Instance \| BindingFlags.Public \| BindingFlags.NonPublic \| BindingFlags.Static; + string toolResponse = (string)typeof(ChatCompletionsExamples).GetMethod(toolName, flags).Invoke(null, toolArguments.Values.Cast<object>().ToArray()); + + Console.WriteLine("->", toolResponse); + requestOptions.Messages.Add(new ChatRequestToolMessage(toolResponse, callId)); + } + else + throw new Exception("Unsupported tool type"); +} +``` -The following contains details about Cohere Chat API. +View the response from the model: -#### Request +```csharp +response = client.Complete(requestOptions); ``` - POST /v1/chat HTTP/1.1 - Host: <DEPLOYMENT_URI> - Authorization: Bearer <TOKEN> - Content-type: application/json + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else + { + throw; + } +} ``` -#### v1/chat request schema +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Cohere Command chat models + +The Cohere Command chat models include the following models: + +# [Cohere Command R+](#tab/cohere-command-r-plus) -Cohere Command R and Command R+ accept the following parameters for a `v1/chat` response inference call: +Command R+ is a generative large language model optimized for various use cases, including reasoning, summarization, and question answering. -\|Key \|Type \|Default \|Description \| -\|\|\|\|\| -\|`message` \|`string` \|Required \|Text input for the model to respond to. \| -\|`chat_history` \|`array of messages` \|`None` \|A list of previous messages between the user and the model, meant to give the model conversational context for responding to the user's message. \| -\|`documents` \|`array` \|`None ` \|A list of relevant documents that the model can cite to generate a more accurate reply. Each document is a string-string dictionary. Keys and values from each document are serialized to a string and passed to the model. The resulting generation includes citations that reference some of these documents. Some suggested keys are "text", "author", and "date". For better generation quality, it's recommended to keep the total word count of the strings in the dictionary to under 300 words. An `_excludes` field (array of strings) can be optionally supplied to omit some key-value pairs from being shown to the model. The omitted fields still show up in the citation object. The "_excludes" field aren't passed to the model. See [Document Mode](https://docs.cohere.com/docs/retrieval-augmented-generation-rag#document-mode) guide from Cohere docs. \| -\|`search_queries_only` \|`boolean` \|`false` \|When `true`, the response only contains a list of generated search queries, but no search takes place, and no reply from the model to the user's `message` is generated.\| -\|`stream` \|`boolean` \|`false` \|When `true`, the response is a JSON stream of events. The final event contains the complete response, and has an `event_type` of `"stream-end"`. Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.\| -\|`max_tokens` \|`integer` \|None \|The maximum number of tokens the model generates as part of the response. Note: Setting a low value might result in incomplete generations. If not specified, generates tokens until end of sequence.\| -\|`temperature` \|`float` \|`0.3` \|Use a lower value to decrease randomness in the response. Randomness can be further maximized by increasing the value of the `p` parameter. Min value is 0, and max is 2. \| -\|`p` \|`float` \|`0.75` \|Use a lower value to ignore less probable options. Set to 0 or 1.0 to disable. If both p and k are enabled, p acts after k. min value of 0.01, max value of 0.99.\| -\|`k` \|`float` \|`0` \|Specify the number of token choices the model uses to generate the next token. If both p and k are enabled, p acts after k. Min value is 0, max value is 500.\| -\|`prompt_truncation` \|`enum string` \|`OFF` \|Accepts `AUTO_PRESERVE_ORDER`, `AUTO`, `OFF`. Dictates how the prompt is constructed. With `prompt_truncation` set to `AUTO_PRESERVE_ORDER`, some elements from `chat_history` and `documents` are dropped to construct a prompt that fits within the model's context length limit. During this process, the order of the documents and chat history are preserved. With `prompt_truncation` set to "OFF", no elements are dropped.\| -\|`stop_sequences` \|`array of strings` \|`None` \|The generated text is cut at the end of the earliest occurrence of a stop sequence. The sequence is included the text. \| -\|`frequency_penalty` \|`float` \|`0` \|Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation. Min value of 0.0, max value of 1.0.\| -\|`presence_penalty` \|`float` \|`0` \|Used to reduce repetitiveness of generated tokens. Similar to `frequency_penalty`, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies. Min value of 0.0, max value of 1.0.\| -\|`seed` \|`integer` \|`None` \|If specified, the backend makes a best effort to sample tokens deterministically, such that repeated requests with the same seed and parameters should return the same result. However, determinism can't be guaranteed.\| -\|`return_prompt` \|`boolean ` \|`false ` \|Returns the full prompt that was sent to the model when `true`. \| -\|`tools` \|`array of objects` \|`None` \|_Field is subject to changes._ A list of available tools (functions) that the model might suggest invoking before producing a text response. When `tools` is passed (without `tool_results`), the `text` field in the response is `""` and the `tool_calls` field in the response is populated with a list of tool calls that need to be made. If no calls need to be made, the `tool_calls` array is empty.\| -\|`tool_results` \|`array of objects` \|`None` \|_Field is subject to changes._ A list of results from invoking tools recommended by the model in the previous chat turn. Results are used to produce a text response and is referenced in citations. When using `tool_results`, `tools` must be passed as well. Each tool_result contains information about how it was invoked, and a list of outputs in the form of dictionaries. Cohere's unique fine-grained citation logic requires the output to be a list. In case the output is just one item, for example, `{"status": 200}`, still wrap it inside a list. \| +* Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. -The `chat_history` object requires the following fields: +We recommend using Command R+ for those workflows that lean on complex retrieval augmented generation (RAG) functionality and multi-step tool use (agents). -\|Key \|Type \|Description \| -\|\|\|\| -\|`role` \|`enum string` \|Takes `USER`, `SYSTEM`, or `CHATBOT`. \| -\|`message` \|`string` \|Text contents of the message. \| -The `documents` object has the following optional fields: +The following models are available: -\|Key \|Type \|Default\| Description \| -\|\|\|\|\| -\|`id` \|`string` \|`None` \|Can be supplied to identify the document in the citations. This field isn't passed to the model. \| -\|`_excludes` \|`array of strings` \|`None`\| Can be optionally supplied to omit some key-value pairs from being shown to the model. The omitted fields still show up in the citation object. The `_excludes` field isn't passed to the model. \| +* [Cohere-command-r-plus](https://aka.ms/azureai/landing/Cohere-command-r-plus) -#### v1/chat response schema -Response fields are fully documented on [Cohere's Chat API reference](https://docs.cohere.com/reference/chat). The response object always contains: +# [Cohere Command R](#tab/cohere-command-r) -\|Key \|Type \|Description \| -\|\|\|\| -\|`response_id` \|`string` \|Unique identifier for chat completion. \| -\|`generation_id` \|`string` \|Unique identifier for chat completion, used with Feedback endpoint on Cohere's platform. \| -\|`text` \|`string` \|Model's response to chat message input. \| -\|`finish_reason` \|`enum string` \|Why the generation was completed. Can be any of the following values: `COMPLETE`, `ERROR`, `ERROR_TOXIC`, `ERROR_LIMIT`, `USER_CANCEL` or `MAX_TOKENS` \| -\|`token_count` \|`integer` \|Count of tokens used. \| -\|`meta` \|`string` \|API usage data, including current version and billable tokens. \| +Command R is a large language model optimized for various use cases, including reasoning, summarization, and question answering. -<br/> +* Model Architecture: Both Command R and Command R+ are autoregressive language models that use an optimized transformer architecture. After pre-training, the models use supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. +* Languages covered: The models are optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, simplified Chinese, and Arabic. +* Pre-training data also included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. +* Context length: Command R and Command R+ support a context length of 128 K. -#### Documents -If `documents` are specified in the request, there are two other fields in the response: +Command R is great for simpler retrieval augmented generation (RAG) and single-step tool use tasks. It's also great for use in applications where price is a major consideration. -\|Key \|Type \|Description \| -\|\|\|\| -\|`documents ` \|`array of objects` \|Lists the documents that were cited in the response. \| -\|`citations` \|`array of objects` \|Specifies which part of the answer was found in a given document. \| -`citations` is an array of objects with the following required fields: +The following models are available: -\|Key \|Type \|Description \| -\|\|\|\| -\|`start` \|`integer` \|The index of text that the citation starts at, counting from zero. For example, a generation of `Hello, world!` with a citation on `world` would have a start value of `7`. This is because the citation starts at `w`, which is the seventh character. \| -\|`end` \|`integer` \|The index of text that the citation ends after, counting from zero. For example, a generation of `Hello, world!` with a citation on `world` would have an end value of `11`. This is because the citation ends after `d`, which is the eleventh character. \| -\|`text` \|`string` \|The text of the citation. For example, a generation of `Hello, world!` with a citation of `world` would have a text value of `world`. \| -\|`document_ids` \|`array of strings` \|Identifiers of documents cited by this section of the generated reply. \| +* [Cohere-command-r](https://aka.ms/azureai/landing/Cohere-command-r) -#### Tools -If `tools` are specified and invoked by the model, there's another field in the response: -\|Key \|Type \|Description \| -\|\|\|\| -\|`tool_calls ` \|`array of objects` \|Contains the tool calls generated by the model. Use it to invoke your tools. \| ++ +> [!TIP] +> Additionally, Cohere supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [Cohere documentation](https://docs.cohere.com/reference/about) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Cohere Command chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Cohere Command chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). -`tool_calls` is an array of objects with the following fields: +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) -\|Key \|Type \|Description \| -\|\|\|\| -\|`name` \|`string` \|Name of the tool to call. \| -\|`parameters` \|`object` \|The name and value of the parameters to use when invoking a tool. \| +### A REST client -#### Search_queries_only -If `search_queries_only=TRUE` is specified in the request, there are two other fields in the response: +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: -\|Key \|Type \|Description \| -\|\|\|\| -\|`is_search_required` \|`boolean` \|Instructs the model to generate a search query. \| -\|`search_queries` \|`array of objects` \|Object that contains a list of search queries. \| +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. -`search_queries` is an array of objects with the following fields: +## Work with chat completions -\|Key \|Type \|Description \| -\|\|\|\| -\|`text` \|`string` \|The text of the search query. \| -\|`generation_id` \|`string` \|Unique identifier for the generated search query. Useful for submitting feedback. \| +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. -#### Examples +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Cohere Command chat models. -##### Chat - Completions -The following example is a sample request call to get chat completions from the Cohere Command model. Use when generating a chat completion. +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` + +The response is as follows: -Request: ```json - { - "chat_history": [ - {"role":"USER", "message": "What is an interesting new role in AI if I don't have an ML background"}, - {"role":"CHATBOT", "message": "You could explore being a prompt engineer!"} - ], - "message": "What are some skills I should have" - } +{ + "model_name": "Cohere-command-r-plus", + "model_type": "chat-completions", + "model_provider_name": "Cohere" +} ``` -Response: +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. ```json - { - "response_id": "09613f65-c603-41e6-94b3-a7484571ac30", - "text": "Writing skills are very important for prompt engineering. Some other key skills are:\n- Creativity\n- Awareness of biases\n- Knowledge of how NLP models work\n- Debugging skills\n\nYou can also have some fun with it and try to create some interesting, innovative prompts to train an AI model that can then be used to create various applications.", - "generation_id": "6d31a57f-4d94-4b05-874d-36d0d78c9549", - "finish_reason": "COMPLETE", - "token_count": { - "prompt_tokens": 99, - "response_tokens": 70, - "total_tokens": 169, - "billed_tokens": 151 +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." }, - "meta": { - "api_version": { - "version": "1" + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Cohere-command-r-plus", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null }, - "billed_units": { - "input_tokens": 81, - "output_tokens": 70 - } + "finish_reason": "stop", + "logprobs": null } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 } +} ``` -##### Chat - Grounded generation and RAG capabilities +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content -Command R and Command R+ are trained for RAG via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. We introduce that prompt template via the `documents` parameter. The document snippets should be chunks, rather than long documents, typically around 100-400 words per chunk. Document snippets consist of key-value pairs. The keys should be short descriptive strings. The values can be text or semi-structured. +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. -Request: ```json - { - "message": "Where do the tallest penguins live?", - "documents": [ +{ + "messages": [ { - "title": "Tall penguins", - "snippet": "Emperor penguins are the tallest." + "role": "system", + "content": "You are a helpful assistant." }, { - "title": "Penguin habitats", - "snippet": "Emperor penguins only live in Antarctica." + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` + +You can visualize how streaming generates content: ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Cohere-command-r-plus", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null } ] - } +} ``` -Response: +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. + ```json - { - "response_id": "d7e72d2e-06c0-469f-8072-a3aa6bd2e3b2", - "text": "Emperor penguins are the tallest species of penguin and they live in Antarctica.", - "generation_id": "b5685d8d-00b4-48f1-b32f-baebabb563d8", - "finish_reason": "COMPLETE", - "token_count": { - "prompt_tokens": 615, - "response_tokens": 15, - "total_tokens": 630, - "billed_tokens": 22 - }, - "meta": { - "api_version": { - "version": "1" +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Cohere-command-r-plus", + "choices": [ + { + "index": 0, + "delta": { + "content": "" }, - "billed_units": { - "input_tokens": 7, - "output_tokens": 15 - } + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." }, - "citations": [ - { - "start": 0, - "end": 16, - "text": "Emperor penguins", - "document_ids": [ - "doc_0" - ] - }, - { - "start": 69, - "end": 80, - "text": "Antarctica.", - "document_ids": [ - "doc_1" - ] - } - ], - "documents": [ - { - "id": "doc_0", - "snippet": "Emperor penguins are the tallest.", - "title": "Tall penguins" + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Cohere-command-r-plus", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null }, - { - "id": "doc_1", - "snippet": "Emperor penguins only live in Antarctica.", - "title": "Penguin habitats" - } - ] + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 } +} ``` -##### Chat - Tool Use +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). -If invoking tools or generating a response based on tool results, use the following parameters. +#### Create JSON outputs + +Cohere Command chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. -Request: ```json - { - "message":"I'd like 4 apples and a fish please", - "tools":[ - { - "name":"personal_shopper", - "description":"Returns items and requested volumes to purchase", - "parameter_definitions":{ - "item":{ - "description":"the item requested to be purchased, in all caps eg. Bananas should be BANANAS", - "type": "str", - "required": true - }, - "quantity":{ - "description": "how many of the items should be purchased", - "type": "int", - "required": true - } - } - } - ], - - "tool_results": [ +{ + "messages": [ { - "call": { - "name": "personal_shopper", - "parameters": { - "item": "Apples", - "quantity": 4 - }, - "generation_id": "cb3a6e8b-6448-4642-b3cd-b1cc08f7360d" - }, - "outputs": [ - { - "response": "Sale completed" - } - ] + "role": "system", + "content": "You are a helpful assistant that always generate responses in JSON format, using the following format: { \"answer\": \"response\" }" }, { - "call": { - "name": "personal_shopper", - "parameters": { - "item": "Fish", - "quantity": 1 + "role": "user", + "content": "How many languages are in the world?" + } + ], + "response_format": { "type": "json_object" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718727522, + "model": "Cohere-command-r-plus", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "{\"answer\": \"There are approximately 7,117 living languages in the world today, according to the latest estimates. However, this number can vary as some languages become extinct and others are newly discovered or classified.\"}", + "tool_calls": null }, - "generation_id": "cb3a6e8b-6448-4642-b3cd-b1cc08f7360d" - }, - "outputs": [ - { - "response": "Sale not completed" - } - ] + "finish_reason": "stop", + "logprobs": null } - ] + ], + "usage": { + "prompt_tokens": 39, + "total_tokens": 87, + "completion_tokens": 48 } +} +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through ``` -Response: ```json - { - "response_id": "fa634da2-ccd1-4b56-8308-058a35daa100", - "text": "I've completed the sale for 4 apples. \n\nHowever, there was an error regarding the fish; it appears that there is currently no stock.", - "generation_id": "f567e78c-9172-4cfa-beba-ee3c330f781a", - "chat_history": [ - { - "message": "I'd like 4 apples and a fish please", - "response_id": "fa634da2-ccd1-4b56-8308-058a35daa100", - "generation_id": "a4c5da95-b370-47a4-9ad3-cbf304749c04", - "role": "User" - }, - { - "message": "I've completed the sale for 4 apples. \n\nHowever, there was an error regarding the fish; it appears that there is currently no stock.", - "response_id": "fa634da2-ccd1-4b56-8308-058a35daa100", - "generation_id": "f567e78c-9172-4cfa-beba-ee3c330f781a", - "role": "Chatbot" - } - ], - "finish_reason": "COMPLETE", - "token_count": { - "prompt_tokens": 644, - "response_tokens": 31, - "total_tokens": 675, - "billed_tokens": 41 - }, - "meta": { - "api_version": { - "version": "1" - }, - "billed_units": { - "input_tokens": 10, - "output_tokens": 31 - } +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." }, - "citations": [ - { - "start": 5, - "end": 23, - "text": "completed the sale", - "document_ids": [ - "" - ] + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` + +### Use tools + +Cohere Command chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```json +{ + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters": { + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } }, - { - "start": 113, - "end": 132, - "text": "currently no stock.", - "document_ids": [ - "" - ] - } - ], - "documents": [ - { - "response": "Sale completed" - } - ] + "required": [ + "origin_city", + "destination_city" + ] + } } +} ``` -Once you run your function and received tool outputs, you can pass them back to the model to generate a response for the user. +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. + +> [!NOTE] +> Cohere-command-r-plus and Cohere-command-r require a tool's responses to be a valid JSON content formatted as a string. When constructing messages of type Tool, ensure the response is a valid JSON string. + +Prompt the model to book flights with the help of this function: -Request: ```json - { - "message":"I'd like 4 apples and a fish please", - "tools":[ - { - "name":"personal_shopper", - "description":"Returns items and requested volumes to purchase", - "parameter_definitions":{ - "item":{ - "description":"the item requested to be purchased, in all caps eg. Bananas should be BANANAS", - "type": "str", - "required": true - }, - "quantity":{ - "description": "how many of the items should be purchased", - "type": "int", - "required": true - } - } - } - ], - - "tool_results": [ +{ + "messages": [ { - "call": { - "name": "personal_shopper", - "parameters": { - "item": "Apples", - "quantity": 4 - }, - "generation_id": "cb3a6e8b-6448-4642-b3cd-b1cc08f7360d" - }, - "outputs": [ - { - "response": "Sale completed" - } - ] + "role": "system", + "content": "You are a helpful assistant that help users to find information about traveling, how to get to places and the different transportations options. You care about the environment and you always have that in mind when answering inqueries" }, { - "call": { - "name": "personal_shopper", - "parameters": { - "item": "Fish", - "quantity": 1 - }, - "generation_id": "cb3a6e8b-6448-4642-b3cd-b1cc08f7360d" - }, - "outputs": [ - { - "response": "Sale not completed" + "role": "user", + "content": "When is the next flight from Miami to Seattle?" + } + ], + "tool_choice": "auto", + "tools": [ + { + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters": { + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": [ + "origin_city", + "destination_city" + ] + } } - ] } ] - } +} ``` -Response: +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. + ```json - { - "response_id": "fa634da2-ccd1-4b56-8308-058a35daa100", - "text": "I've completed the sale for 4 apples. \n\nHowever, there was an error regarding the fish; it appears that there is currently no stock.", - "generation_id": "f567e78c-9172-4cfa-beba-ee3c330f781a", - "chat_history": [ - { - "message": "I'd like 4 apples and a fish please", - "response_id": "fa634da2-ccd1-4b56-8308-058a35daa100", - "generation_id": "a4c5da95-b370-47a4-9ad3-cbf304749c04", - "role": "User" +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726007, + "model": "Cohere-command-r-plus", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "", + "tool_calls": [ + { + "id": "abc0dF1gh", + "type": "function", + "function": { + "name": "get_flight_info", + "arguments": "{\"origin_city\": \"Miami\", \"destination_city\": \"Seattle\"}", + "call_id": null + } + } + ] }, - { - "message": "I've completed the sale for 4 apples. \n\nHowever, there was an error regarding the fish; it appears that there is currently no stock.", - "response_id": "fa634da2-ccd1-4b56-8308-058a35daa100", - "generation_id": "f567e78c-9172-4cfa-beba-ee3c330f781a", - "role": "Chatbot" - } - ], - "finish_reason": "COMPLETE", - "token_count": { - "prompt_tokens": 644, - "response_tokens": 31, - "total_tokens": 675, - "billed_tokens": 41 + "finish_reason": "tool_calls", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 190, + "total_tokens": 226, + "completion_tokens": 36 + } +} +``` + +To continue, append this message to the chat history: + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. + +View the response from the model: ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that help users to find information about traveling, how to get to places and the different transportations options. You care about the environment and you always have that in mind when answering inqueries" }, - "meta": { - "api_version": { - "version": "1" - }, - "billed_units": { - "input_tokens": 10, - "output_tokens": 31 - } + { + "role": "user", + "content": "When is the next flight from Miami to Seattle?" }, - "citations": [ - { - "start": 5, - "end": 23, - "text": "completed the sale", - "document_ids": [ - "" - ] - }, - { - "start": 113, - "end": 132, - "text": "currently no stock.", - "document_ids": [ - "" - ] + { + "role": "assistant", + "content": "", + "tool_calls": [ + { + "id": "abc0DeFgH", + "type": "function", + "function": { + "name": "get_flight_info", + "arguments": "{\"origin_city\": \"Miami\", \"destination_city\": \"Seattle\"}", + "call_id": null + } + } + ] + }, + { + "role": "tool", + "content": "{ \"info\": \"There are no flights available from Miami to Seattle. You should take a train, specially if it helps to reduce CO2 emissions.\" }", + "tool_call_id": "abc0DeFgH" + } + ], + "tool_choice": "auto", + "tools": [ + { + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters":{ + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": ["origin_city", "destination_city"] } - ], - "documents": [ - { - "response": "Sale completed" } - ] - } + } + ] +} ``` -##### Chat - Search queries -If you're building a RAG agent, you can also use Cohere's Chat API to get search queries from Command. Specify `search_queries_only=TRUE` in your request. +### Apply content safety +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. -Request: ```json - { - "message": "Which lego set has the greatest number of pieces?", - "search_queries_only": true - } +{ + "messages": [ + { + "role": "system", + "content": "You are an AI assistant that helps people find information." + }, + { + "role": "user", + "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + } + ] +} ``` -Response: ```json - { - "response_id": "5e795fe5-24b7-47b4-a8bc-b58a68c7c676", - "text": "", - "finish_reason": "COMPLETE", - "meta": { - "api_version": { - "version": "1" - } - }, - "is_search_required": true, - "search_queries": [ - { - "text": "lego set with most pieces", - "generation_id": "a086696b-ad8e-4d15-92e2-1c57a3526e1c" - } - ] +{ + "error": { + "message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.", + "type": null, + "param": "prompt", + "code": "content_filter", + "status": 400 } +} ``` -##### More inference examples +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). -\| Package \| Sample Notebook \| -\|-\|-\| -\| CLI using CURL and Python web requests - Command R \| [command-r.ipynb](https://aka.ms/samples/cohere-command-r/webrequests)\| -\| CLI using CURL and Python web requests - Command R+ \| [command-r-plus.ipynb](https://aka.ms/samples/cohere-command-r-plus/webrequests)\| -\| OpenAI SDK (experimental) \| [openaisdk.ipynb](https://aka.ms/samples/cohere-command/openaisdk) \| -\| LangChain \| [langchain.ipynb](https://aka.ms/samples/cohere/langchain) \| -\| Cohere SDK \| [cohere-sdk.ipynb](https://aka.ms/samples/cohere-python-sdk) \| -\| LiteLLM SDK \| [litellm.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) \| -##### Retrieval Augmented Generation (RAG) and tool use samples -Description \| Package \| Sample Notebook \|--\|-- -Create a local Facebook AI similarity search (FAISS) vector index, using Cohere embeddings - Langchain\|`langchain`, `langchain_cohere`\|[cohere_faiss_langchain_embed.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere_faiss_langchain_embed.ipynb) -Use Cohere Command R/R+ to answer questions from data in local FAISS vector index - Langchain\|`langchain`, `langchain_cohere`\|[command_faiss_langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_faiss_langchain.ipynb) -Use Cohere Command R/R+ to answer questions from data in AI search vector index - Langchain\|`langchain`, `langchain_cohere`\|[cohere-aisearch-langchain-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-langchain-rag.ipynb) -Use Cohere Command R/R+ to answer questions from data in AI search vector index - Cohere SDK\| `cohere`, `azure_search_documents`\|[cohere-aisearch-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-rag.ipynb) -Command R+ tool/function calling, using LangChain\|`cohere`, `langchain`, `langchain_cohere`\|[command_tools-langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_tools-langchain.ipynb) +## More inference examples -## Cost and quotas +For more examples of how to use Cohere, see the following examples and tutorials: -### Cost and quota considerations for models deployed as a serverless API +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| Web requests \| Bash \| [Command-R](https://aka.ms/samples/cohere-command-r/webrequests) - [Command-R+](https://aka.ms/samples/cohere-command-r-plus/webrequests) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/samples/cohere-command/openaisdk) \| +\| LangChain \| Python \| [Link](https://aka.ms/samples/cohere/langchain) \| +\| Cohere SDK \| Python \| [Link](https://aka.ms/samples/cohere-python-sdk) \| +\| LiteLLM SDK \| Python \| [Link](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) \| -Cohere models deployed as a serverless API with pay-as-you-go billing are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. +#### Retrieval Augmented Generation (RAG) and tool use samples -Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. +\| Description \| Packages \| Sample \| +\|-\|\|--\| +\| Create a local Facebook AI similarity search (FAISS) vector index, using Cohere embeddings - Langchain \| `langchain`, `langchain_cohere` \| [cohere_faiss_langchain_embed.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere_faiss_langchain_embed.ipynb) \| +\| Use Cohere Command R/R+ to answer questions from data in local FAISS vector index - Langchain \|`langchain`, `langchain_cohere` \| [command_faiss_langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_faiss_langchain.ipynb) \| +\| Use Cohere Command R/R+ to answer questions from data in AI search vector index - Langchain \| `langchain`, `langchain_cohere` \| [cohere-aisearch-langchain-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-langchain-rag.ipynb) \| +\| Use Cohere Command R/R+ to answer questions from data in AI search vector index - Cohere SDK \| `cohere`, `azure_search_documents` \| [cohere-aisearch-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-rag.ipynb) \| +\| Command R+ tool/function calling, using LangChain \| `cohere`, `langchain`, `langchain_cohere` \| [command_tools-langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_tools-langchain.ipynb) \| -For more information on how to track costs, see [monitor costs for models offered throughout the Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). -Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. +## Cost and quota considerations for Cohere family of models deployed as serverless API endpoints -## Content filtering +Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. -Models deployed as a serverless API with pay-as-you-go billing are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md). +Cohere models deployed as a serverless API are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. + +Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. + +For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). ## Related content -- [What is Azure AI Studio?](../what-is-ai-studio.md)-- [Azure AI FAQ article](../faq.yml)-- [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)+ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Cohere Embed	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-cohere-embed.md	Title: How to deploy Cohere Embed models with Azure AI Studio + Title: How to use Cohere Embed V3 models with Azure AI Studio -description: Learn how to deploy Cohere Embed models with Azure AI Studio. - +description: Learn how to use Cohere Embed V3 models with Azure AI Studio. + Previously updated : 5/21/2024 Last updated : 08/08/2024 +reviewer: shubhirajMsft -+ +zone_pivot_groups: azure-ai-model-catalog-samples-embeddings -# How to deploy Cohere Embed models with Azure AI Studio +# How to use Cohere Embed V3 models with Azure AI Studio +In this article, you learn about Cohere Embed V3 models and how to use them with Azure AI Studio. +The Cohere family of models includes various models optimized for different use cases, including chat completions, embeddings, and rerank. Cohere models are optimized for various use cases that include reasoning, summarization, and question answering. -In this article, you learn how to use Azure AI Studio to deploy the Cohere Embed models as serverless APIs with pay-as-you-go token-based billing. -Cohere offers two Embed models in [Azure AI Studio](https://ai.azure.com). These models are available as serverless APIs with pay-as-you-go token-based billing. You can browse the Cohere family of models in the [Model Catalog](model-catalog.md) by filtering on the Cohere collection. -## Cohere Embed models -In this section, you learn about the two Cohere Embed models that are available in the model catalog: +## Cohere embedding models -* Cohere Embed v3 - English -* Cohere Embed v3 - Multilingual +The Cohere family of models for embeddings includes the following models: -You can browse the Cohere family of models in the [Model Catalog](model-catalog-overview.md) by filtering on the Cohere collection. +# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english) -### Cohere Embed v3 - English -Cohere Embed English is the market's leading text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English has top performance on the HuggingFace MTEB benchmark and performs well on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes: +Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes: * Embed English has 1,024 dimensions. * Context window of the model is 512 tokens -### Cohere Embed v3 - Multilingual -Cohere Embed Multilingual is the market's leading text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports 100+ languages and can be used to search within a language (for example, search with a French query on French documents) and across languages (for example, search with an English query on Chinese documents). Embed multilingual has state-of-the-art performance on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes: + +# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual) + +Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes: * Embed Multilingual has 1,024 dimensions. * Context window of the model is 512 tokens -## Deploy as a serverless API -Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription. ++ +## Prerequisites + +To use Cohere Embed V3 models with Azure AI Studio, you need the following prerequisites: + +### A model deployment -The previously mentioned Cohere models can be deployed as a service with pay-as-you-go billing and are offered by Cohere through the Microsoft Azure Marketplace. Cohere can change or update the terms of use and pricing of these models. +Deployment to serverless APIs -### Prerequisites +Cohere Embed V3 models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. -- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for Cohere Embed is only available with hubs created in these regions: +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). - * East US - * East US 2 - * North Central US - * South Central US - * West US - * West US 3 - * Sweden Central - - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) -- An [AI Studio project](../how-to/create-projects.md) in Azure AI Studio.-- Azure role-based access controls are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md). +### The inference package installed +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: -### Create a new deployment +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: -The following steps demonstrate the deployment of Cohere Embed v3 - English, but you can use the same steps to deploy Cohere Embed v3 - Multilingual by replacing the model name. +```bash +pip install azure-ai-inference +``` -To create a deployment: +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). -1. Sign in to [Azure AI Studio](https://ai.azure.com). -1. Select Model catalog from the left sidebar. -1. Search for Cohere. -1. Select Cohere-embed-v3-english to open the Model Details page. +> [!TIP] +> Additionally, Cohere supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [Cohere documentation](https://docs.cohere.com/reference/about). - :::image type="content" source="../media/deploy-monitor/cohere-embed/embed-english-deploy-directly-from-catalog.png" alt-text="A screenshot showing how to access the model details page by going through the model catalog." lightbox="../media/deploy-monitor/cohere-embed/embed-english-deploy-directly-from-catalog.png"::: +## Work with embeddings -1. Select Deploy to open a serverless API deployment window for the model. -1. Alternatively, you can initiate a deployment by starting from your project in AI Studio. +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with an embeddings model. - 1. From the left sidebar of your project, select Components > Deployments. - 1. Select + Create deployment. - 1. Search for and select Cohere-embed-v3-english. to open the Model Details page. +### Create a client to consume the model - :::image type="content" source="../media/deploy-monitor/cohere-embed/embed-english-deploy-start-from-project.png" alt-text="A screenshot showing how to access the model details page by going through the Deployments page in your project." lightbox="../media/deploy-monitor/cohere-embed/embed-english-deploy-start-from-project.png"::: +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. - 1. Select Confirm to open a serverless API deployment window for the model. - :::image type="content" source="../media/deploy-monitor/cohere-embed/embed-english-deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the pay-as-you-go option." lightbox="../media/deploy-monitor/cohere-embed/embed-english-deploy-pay-as-you-go.png"::: +```python +import os +from azure.ai.inference import EmbeddingsClient +from azure.core.credentials import AzureKeyCredential -1. Select the project in which you want to deploy your model. To deploy the model your project must be in the EastUS2 or Sweden Central region. -1. In the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. -1. Select the Pricing and terms tab to learn about pricing for the selected model. -1. Select the Subscribe and Deploy button. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering. This step requires that your account has the Azure AI Developer role permissions on the resource group, as listed in the prerequisites. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. Currently, you can have only one deployment for each model within a project. -1. Once you subscribe the project for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ project don't require subscribing again. If this scenario applies to you, there's a Continue to deploy option to select. +model = EmbeddingsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` - :::image type="content" source="../media/deploy-monitor/cohere-embed/embed-english-existing-subscription.png" alt-text="A screenshot showing a project that is already subscribed to the offering." lightbox="../media/deploy-monitor/cohere-embed/embed-english-existing-subscription.png"::: +### Get the model's capabilities -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: - :::image type="content" source="../media/deploy-monitor/cohere-embed/embed-english-deployment-name.png" alt-text="A screenshot showing how to indicate the name of the deployment you want to create." lightbox="../media/deploy-monitor/cohere-embed/embed-english-deployment-name.png"::: -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. -1. Select Open in playground to start interacting with the model. -1. Return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key. For more information on using the APIs, see the [reference](#embed-api-reference-for-cohere-embed-models-deployed-as-a-service) section. -1. You can always find the endpoint's details, URL, and access keys by navigating to your Project overview page. Then, from the left sidebar of your project, select Components > Deployments. +```python +model_info = model.get_model_info() +``` -To learn about billing for the Cohere models deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Cohere models deployed as a service](#cost-and-quota-considerations-for-models-deployed-as-a-service). +The response is as follows: -### Consume the Cohere Embed models as a service -These models can be consumed using the embed API. +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` -1. From your Project overview page, go to the left sidebar and select Components > Deployments. +```console +Model name: Cohere-embed-v3-english +Model type": embeddings +Model provider name": Cohere +``` -1. Find and select the deployment you created. +### Create embeddings -1. Copy the Target URL and the Key value. +Create an embedding request to see the output of the model. -1. Cohere exposes two routes for inference with the Embed v3 - English and Embed v3 - Multilingual models. `v1/embeddings` adheres to the Azure AI Generative Messages API schema, and `v1/embed` supports Cohere's native API schema. +```python +response = model.embed( + input=["The ultimate answer to the question of life"], +) +``` - For more information on using the APIs, see the [reference](#embed-api-reference-for-cohere-embed-models-deployed-as-a-service) section. +> [!TIP] +> The context window for Cohere Embed V3 models is 512. Make sure that you don't exceed this limit when creating embeddings. -## Embed API reference for Cohere Embed models deployed as a service +The response is as follows, where you can see the model's usage statistics: -Cohere Embed v3 - English and Embed v3 - Multilingual accept both the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/embeddings` and the native [Cohere Embed v3 API](#cohere-embed-v3) on `/embed`. -### Azure AI Model Inference API +```python +import numpy as np -The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Embeddings](../reference/reference-model-inference-embeddings.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started). +for embed in response.data: + print("Embeding of size:", np.asarray(embed.embedding).shape) -### Cohere Embed v3 +print("Model:", response.model) +print("Usage:", response.usage) +``` -The following contains details about Cohere Embed v3 API. +It can be useful to compute embeddings in input batches. The parameter `inputs` can be a list of strings, where each string is a different input. In turn the response is a list of embeddings, where each embedding corresponds to the input in the same position. -#### Request +```python +response = model.embed( + input=[ + "The ultimate answer to the question of life", + "The largest planet in our solar system is Jupiter", + ], +) ``` - POST /v1/embed HTTP/1.1 - Host: <DEPLOYMENT_URI> - Authorization: Bearer <TOKEN> - Content-type: application/json + +The response is as follows, where you can see the model's usage statistics: ++ +```python +import numpy as np + +for embed in response.data: + print("Embeding of size:", np.asarray(embed.embedding).shape) + +print("Model:", response.model) +print("Usage:", response.usage) ``` -#### v1/embed request schema +> [!TIP] +> Cohere Embed V3 models can take batches of 1024 at a time. When creating batches, make sure that you don't exceed this limit. + +#### Create different types of embeddings -Cohere Embed v3 - English and Embed v3 - Multilingual accept the following parameters for a `v1/embed` API call: +Cohere Embed V3 models can generate multiple embeddings for the same input depending on how you plan to use them. This capability allows you to retrieve more accurate embeddings for RAG patterns. -\|Key \|Type \|Default \|Description \| -\|\|\|\|\| -\|`texts` \|`array of strings` \|Required \|An array of strings for the model to embed. Maximum number of texts per call is 96. We recommend reducing the length of each text to be under 512 tokens for optimal quality. \| -\|`input_type` \|`enum string` \|Required \|Prepends special tokens to differentiate each type from one another. You shouldn't mix different types together, except when mixing types for search and retrieval. In this case, embed your corpus with the `search_document` type and embedded queries with type `search_query` type. <br/> `search_document` ΓÇô In search use-cases, use search_document when you encode documents for embeddings that you store in a vector database. <br/> `search_query` ΓÇô Use search_query when querying your vector database to find relevant documents. <br/> `classification` ΓÇô Use classification when using embeddings as an input to a text classifier. <br/> `clustering` ΓÇô Use clustering to cluster the embeddings.\| -\|`truncate` \|`enum string` \|`NONE` \|`NONE` ΓÇô Returns an error when the input exceeds the maximum input token length. <br/> `START` ΓÇô Discards the start of the input. <br/> `END` ΓÇô Discards the end of the input. \| -\|`embedding_types` \|`array of strings` \|`float` \|Specifies the types of embeddings you want to get back. Can be one or more of the following types. `float`, `int8`, `uint8`, `binary`, `ubinary` \| +The following example shows how to create embeddings that are used to create an embedding for a document that will be stored in a vector database: -#### v1/embed response schema -Cohere Embed v3 - English and Embed v3 - Multilingual include the following fields in the response: +```python +from azure.ai.inference.models import EmbeddingInputType -\|Key \|Type \|Description \| -\|\|\|\| -\|`response_type` \|`enum` \|The response type. Returns `embeddings_floats` when `embedding_types` isn't specified, or returns `embeddings_by_type` when `embeddings_types` is specified. \| -\|`id` \|`integer` \|An identifier for the response. \| -\|`embeddings` \|`array` or `array of objects` \|An array of embeddings, where each embedding is an array of floats with 1,024 elements. The length of the embeddings array is the same as the length of the original texts array.\| -\|`texts` \|`array of strings` \|The text entries for which embeddings were returned. \| -\|`meta` \|`string` \|API usage data, including current version and billable tokens. \| +response = model.embed( + input=["The answer to the ultimate question of life, the universe, and everything is 42"], + input_type=EmbeddingInputType.DOCUMENT, +) +``` -For more information, see [https://docs.cohere.com/reference/embed](https://docs.cohere.com/reference/embed). +When you work on a query to retrieve such a document, you can use the following code snippet to create the embeddings for the query and maximize the retrieval performance. -### v1/embed examples -#### embeddings_floats Response +```python +from azure.ai.inference.models import EmbeddingInputType -Request: +response = model.embed( + input=["What's the ultimate meaning of life?"], + input_type=EmbeddingInputType.QUERY, +) +``` -```json - { - "input_type": "clustering", - "truncate": "START", - "texts":["hi", "hello"] +Cohere Embed V3 models can optimize the embeddings based on its use case. ++++ +## Cohere embedding models + +The Cohere family of models for embeddings includes the following models: + +# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english) + +Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes: + +* Embed English has 1,024 dimensions. +* Context window of the model is 512 tokens ++ +# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual) + +Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes: + +* Embed Multilingual has 1,024 dimensions. +* Context window of the model is 512 tokens ++++ +## Prerequisites + +To use Cohere Embed V3 models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Cohere Embed V3 models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +> [!TIP] +> Additionally, Cohere supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [Cohere documentation](https://docs.cohere.com/reference/about). + +## Work with embeddings + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with an embeddings model. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name); +console.log("Model type: ", model_info.body.model_type); +console.log("Model provider name: ", model_info.body.model_provider_name); +``` + +```console +Model name: Cohere-embed-v3-english +Model type": embeddings +Model provider name": Cohere +``` + +### Create embeddings + +Create an embedding request to see the output of the model. + +```javascript +var response = await client.path("/embeddings").post({ + body: { + input: ["The ultimate answer to the question of life"], } +}); ``` -Response: +> [!TIP] +> The context window for Cohere Embed V3 models is 512. Make sure that you don't exceed this limit when creating embeddings. -```json - { - "id": "da7a104c-e504-4349-bcd4-4d69dfa02077", - "texts": [ - "hi", - "hello" - ], - "embeddings": [ - [ - ... - ], - [ - ... - ] +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log(response.embedding); +console.log(response.body.model); +console.log(response.body.usage); +``` + +It can be useful to compute embeddings in input batches. The parameter `inputs` can be a list of strings, where each string is a different input. In turn the response is a list of embeddings, where each embedding corresponds to the input in the same position. ++ +```javascript +var response = await client.path("/embeddings").post({ + body: { + input: [ + "The ultimate answer to the question of life", + "The largest planet in our solar system is Jupiter", ], - "meta": { - "api_version": { - "version": "1" - }, - "billed_units": { - "input_tokens": 2 - } - }, - "response_type": "embeddings_floats" } +}); ``` -#### Embeddings_by_types response +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log(response.embedding); +console.log(response.body.model); +console.log(response.body.usage); +``` + +> [!TIP] +> Cohere Embed V3 models can take batches of 1024 at a time. When creating batches, make sure that you don't exceed this limit. + +#### Create different types of embeddings + +Cohere Embed V3 models can generate multiple embeddings for the same input depending on how you plan to use them. This capability allows you to retrieve more accurate embeddings for RAG patterns. + +The following example shows how to create embeddings that are used to create an embedding for a document that will be stored in a vector database: ++ +```javascript +var response = await client.path("/embeddings").post({ + body: { + input: ["The answer to the ultimate question of life, the universe, and everything is 42"], + input_type: "document", + } +}); +``` + +When you work on a query to retrieve such a document, you can use the following code snippet to create the embeddings for the query and maximize the retrieval performance. ++ +```javascript +var response = await client.path("/embeddings").post({ + body: { + input: ["What's the ultimate meaning of life?"], + input_type: "query", + } +}); +``` + +Cohere Embed V3 models can optimize the embeddings based on its use case. ++++ +## Cohere embedding models + +The Cohere family of models for embeddings includes the following models: + +# [Cohere Embed v3 - English](#tab/cohere-embed-v3-english) + +Cohere Embed English is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed English performs well on the HuggingFace (massive text embed) MTEB benchmark and on use-cases for various industries, such as Finance, Legal, and General-Purpose Corpora. Embed English also has the following attributes: + +* Embed English has 1,024 dimensions. +* Context window of the model is 512 tokens ++ +# [Cohere Embed v3 - Multilingual](#tab/cohere-embed-v3-multilingual) + +Cohere Embed Multilingual is a text representation model used for semantic search, retrieval-augmented generation (RAG), classification, and clustering. Embed Multilingual supports more than 100 languages and can be used to search within a language (for example, to search with a French query on French documents) and across languages (for example, to search with an English query on Chinese documents). Embed multilingual performs well on multilingual benchmarks such as Miracl. Embed Multilingual also has the following attributes: + +* Embed Multilingual has 1,024 dimensions. +* Context window of the model is 512 tokens ++++ +## Prerequisites + +To use Cohere Embed V3 models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Cohere Embed V3 models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +> [!TIP] +> Additionally, Cohere supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [Cohere documentation](https://docs.cohere.com/reference/about). + +## Work with embeddings + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with an embeddings model. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` + +The response is as follows: -Request: ```json - { - "input_type": "clustering", - "embedding_types": ["int8", "binary"], - "truncate": "START", - "texts":["hi", "hello"] +{ + "model_name": "Cohere-embed-v3-english", + "model_type": "embeddings", + "model_provider_name": "Cohere" +} +``` + +### Create embeddings + +Create an embedding request to see the output of the model. + +```json +{ + "input": [ + "The ultimate answer to the question of life" + ] +} +``` + +> [!TIP] +> The context window for Cohere Embed V3 models is 512. Make sure that you don't exceed this limit when creating embeddings. + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0ab1234c-d5e6-7fgh-i890-j1234k123456", + "object": "list", + "data": [ + { + "index": 0, + "object": "embedding", + "embedding": [ + 0.017196655, + // ... + -0.000687122, + -0.025054932, + -0.015777588 + ] + } + ], + "model": "Cohere-embed-v3-english", + "usage": { + "prompt_tokens": 9, + "completion_tokens": 0, + "total_tokens": 9 } +} ``` -Response: +It can be useful to compute embeddings in input batches. The parameter `inputs` can be a list of strings, where each string is a different input. In turn the response is a list of embeddings, where each embedding corresponds to the input in the same position. + ```json - { - "id": "b604881a-a5e1-4283-8c0d-acbd715bf144", - "texts": [ - "hi", - "hello" - ], - "embeddings": { - "binary": [ - [ - ... - ], - [ - ... - ] - ], - "int8": [ - [ - ... - ], - [ - ... - ] +{ + "input": [ + "The ultimate answer to the question of life", + "The largest planet in our solar system is Jupiter" + ] +} +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0ab1234c-d5e6-7fgh-i890-j1234k123456", + "object": "list", + "data": [ + { + "index": 0, + "object": "embedding", + "embedding": [ + 0.017196655, + // ... + -0.000687122, + -0.025054932, + -0.015777588 ] }, - "meta": { - "api_version": { - "version": "1" - }, - "billed_units": { - "input_tokens": 2 - } - }, - "response_type": "embeddings_by_type" + { + "index": 1, + "object": "embedding", + "embedding": [ + 0.017196655, + // ... + -0.000687122, + -0.025054932, + -0.015777588 + ] + } + ], + "model": "Cohere-embed-v3-english", + "usage": { + "prompt_tokens": 19, + "completion_tokens": 0, + "total_tokens": 19 } +} ``` -#### More inference examples +> [!TIP] +> Cohere Embed V3 models can take batches of 1024 at a time. When creating batches, make sure that you don't exceed this limit. + +#### Create different types of embeddings + +Cohere Embed V3 models can generate multiple embeddings for the same input depending on how you plan to use them. This capability allows you to retrieve more accurate embeddings for RAG patterns. -\| Package \| Sample Notebook \| -\|-\|-\| -\| CLI using CURL and Python web requests \| [cohere-embed.ipynb](https://aka.ms/samples/embed-v3/webrequests)\| -\| OpenAI SDK (experimental) \| [openaisdk.ipynb](https://aka.ms/samples/cohere-embed/openaisdk) \| -\| LangChain \| [langchain.ipynb](https://aka.ms/samples/cohere-embed/langchain) \| -\| Cohere SDK \| [cohere-sdk.ipynb](https://aka.ms/samples/cohere-embed/cohere-python-sdk) \| -\| LiteLLM SDK \| [litellm.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) \| +The following example shows how to create embeddings that are used to create an embedding for a document that will be stored in a vector database: -##### Retrieval Augmented Generation (RAG) and tool-use samples -Description \| Package \| Sample Notebook \|--\|-- -Create a local Facebook AI Similarity Search (FAISS) vector index, using Cohere embeddings - Langchain\|`langchain`, `langchain_cohere`\|[cohere_faiss_langchain_embed.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere_faiss_langchain_embed.ipynb) -Use Cohere Command R/R+ to answer questions from data in local FAISS vector index - Langchain\|`langchain`, `langchain_cohere`\|[command_faiss_langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_faiss_langchain.ipynb) -Use Cohere Command R/R+ to answer questions from data in AI search vector index - Langchain\|`langchain`, `langchain_cohere`\|[cohere-aisearch-langchain-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-langchain-rag.ipynb) -Use Cohere Command R/R+ to answer questions from data in AI search vector index - Cohere SDK\| `cohere`, `azure_search_documents`\|[cohere-aisearch-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-rag.ipynb) -Command R+ tool/function calling, using LangChain\|`cohere`, `langchain`, `langchain_cohere`\|[command_tools-langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_tools-langchain.ipynb) -## Cost and quotas +```json +{ + "input": [ + "The answer to the ultimate question of life, the universe, and everything is 42" + ], + "input_type": "document" +} +``` -### Cost and quota considerations for models deployed as a service +When you work on a query to retrieve such a document, you can use the following code snippet to create the embeddings for the query and maximize the retrieval performance. -Cohere models deployed as a serverless API with pay-as-you-go billing are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. -Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. +```json +{ + "input": [ + "What's the ultimate meaning of life?" + ], + "input_type": "query" +} +``` -For more information on how to track costs, see [monitor costs for models offered throughout the Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). +Cohere Embed V3 models can optimize the embeddings based on its use case. -Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. -## Content filtering +## More inference examples -Models deployed as a serverless API are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md). +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| Web requests \| Bash \| [Command-R](https://aka.ms/samples/cohere-command-r/webrequests) - [Command-R+](https://aka.ms/samples/cohere-command-r-plus/webrequests) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/samples/cohere-command/openaisdk) \| +\| LangChain \| Python \| [Link](https://aka.ms/samples/cohere/langchain) \| +\| Cohere SDK \| Python \| [Link](https://aka.ms/samples/cohere-python-sdk) \| +\| LiteLLM SDK \| Python \| [Link](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/litellm.ipynb) \| + +#### Retrieval Augmented Generation (RAG) and tool use samples + +\| Description \| Packages \| Sample \| +\|-\|\|--\| +\| Create a local Facebook AI similarity search (FAISS) vector index, using Cohere embeddings - Langchain \| `langchain`, `langchain_cohere` \| [cohere_faiss_langchain_embed.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere_faiss_langchain_embed.ipynb) \| +\| Use Cohere Command R/R+ to answer questions from data in local FAISS vector index - Langchain \|`langchain`, `langchain_cohere` \| [command_faiss_langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_faiss_langchain.ipynb) \| +\| Use Cohere Command R/R+ to answer questions from data in AI search vector index - Langchain \| `langchain`, `langchain_cohere` \| [cohere-aisearch-langchain-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-langchain-rag.ipynb) \| +\| Use Cohere Command R/R+ to answer questions from data in AI search vector index - Cohere SDK \| `cohere`, `azure_search_documents` \| [cohere-aisearch-rag.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/cohere-aisearch-rag.ipynb) \| +\| Command R+ tool/function calling, using LangChain \| `cohere`, `langchain`, `langchain_cohere` \| [command_tools-langchain.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/cohere/command_tools-langchain.ipynb) \| ++ +## Cost and quota considerations for Cohere family of models deployed as serverless API endpoints + +Cohere models deployed as a serverless API are offered by Cohere through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. + +Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. + +For more information on how to track costs, see monitor costs for models offered throughout the Azure Marketplace. + +Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. ## Related content -- [What is Azure AI Studio?](../what-is-ai-studio.md)-- [Azure AI FAQ article](../faq.yml)-- [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)+ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Cohere Rerank	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-cohere-rerank.md	To create a deployment: 1. Alternatively, you can initiate a deployment by starting from your project in AI Studio. 1. From the left sidebar of your project, select Components > Deployments. - 1. Select + Create deployment. + 1. Select + Deploy model. 1. Search for and select Cohere-rerank-3-english. to open the Model Details page. 1. Select Confirm to open a serverless API deployment window for the model.
ai-studio	Deploy Models Jais	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-jais.md	+ + Title: How to use Jais chat models with Azure AI Studio + +description: Learn how to use Jais chat models with Azure AI Studio. +++ Last updated : 08/08/2024+ +reviewer: hazemelh +++ +zone_pivot_groups: azure-ai-model-catalog-samples-chat ++ +# How to use Jais chat models + +In this article, you learn about Jais chat models and how to use them. +JAIS 30b Chat is an autoregressive bi-lingual LLM for Arabic & English. The tuned versions use supervised fine-tuning (SFT). The model is fine-tuned with both Arabic and English prompt-response pairs. The fine-tuning datasets included a wide range of instructional data across various domains. The model covers a wide range of common tasks including question answering, code generation, and reasoning over textual content. To enhance performance in Arabic, the Core42 team developed an in-house Arabic dataset and translated some open-source English instructions into Arabic. + +* Context length: JAIS supports a context length of 8K. +* Input: Model input is text only. +* Output: Model generates text only. ++++++ +You can learn more about the models in their respective model card: + +* [jais-30b-chat](https://aka.ms/azureai/landing/jais-30b-chat) ++ +## Prerequisites + +To use Jais chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Jais chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jais chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```python +model_info = client.get_model_info() +``` + +The response is as follows: ++ +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: jais-30b-chat +Model type: chat-completions +Model provider name: G42 +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```python +from azure.ai.inference.models import SystemMessage, UserMessage + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: jais-30b-chat +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +> [!WARNING] +> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```python +from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage + +try: + response = client.complete( + messages=[ + SystemMessage(content="You are an AI assistant that helps people find information."), + UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), + ] + ) + + print(response.choices[0].message.content) + +except HttpResponseError as ex: + if ex.status_code == 400: + response = ex.response.json() + if isinstance(response, dict) and "error" in response: + print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}") + else: + raise + raise +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++++ +You can learn more about the models in their respective model card: + +* [jais-30b-chat](https://aka.ms/azureai/landing/jais-30b-chat) ++ +## Prerequisites + +To use Jais chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Jais chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jais chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: jais-30b-chat +Model type: chat-completions +Model provider name: G42 +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: jais-30b-chat +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +> [!WARNING] +> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```javascript +try { + var messages = [ + { role: "system", content: "You are an AI assistant that helps people find information." }, + { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." }, + ]; + + var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } + }); + + console.log(response.body.choices[0].message.content); +} +catch (error) { + if (error.status_code == 400) { + var response = JSON.parse(error.response._content); + if (response.error) { + console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`); + } + else + { + throw error; + } + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++++ +You can learn more about the models in their respective model card: + +* [jais-30b-chat](https://aka.ms/azureai/landing/jais-30b-chat) ++ +## Prerequisites + +To use Jais chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Jais chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jais chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: jais-30b-chat +Model type: chat-completions +Model provider name: G42 +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: jais-30b-chat +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` + +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. + +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` + +You can visualize how streaming generates content: ++ +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +> [!WARNING] +> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else + { + throw; + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++++ +You can learn more about the models in their respective model card: + +* [jais-30b-chat](https://aka.ms/azureai/landing/jais-30b-chat) ++ +## Prerequisites + +To use Jais chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Jais chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jais chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` + +The response is as follows: ++ +```json +{ + "model_name": "jais-30b-chat", + "model_type": "chat-completions", + "model_provider_name": "G42" +} +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "jais-30b-chat", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` + +You can visualize how streaming generates content: ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "jais-30b-chat", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` + +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "jais-30b-chat", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "jais-30b-chat", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +> [!WARNING] +> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are an AI assistant that helps people find information." + }, + { + "role": "user", + "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + } + ] +} +``` ++ +```json +{ + "error": { + "message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.", + "type": null, + "param": "prompt", + "code": "content_filter", + "status": 400 + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++ +## More inference examples + +For more examples of how to use Jais, see the following examples and tutorials: + +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| + +## Cost and quota considerations for Jais family of models deployed as serverless API endpoints + +Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. + +Jais models deployed as a serverless API are offered by G42 through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. + +Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. + +For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). + +## Related content ++ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Jamba	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-jamba.md	Title: How to deploy AI21's Jamba-Instruct model with Azure AI Studio + Title: How to use Jamba-Instruct chat models with Azure AI Studio -description: How to deploy AI21's Jamba-Instruct model with Azure AI Studio +description: Learn how to use Jamba-Instruct chat models with Azure AI Studio. + - Previously updated : 06/19/2024- Last updated : 08/08/2024 reviewer: tgokal-+++ +zone_pivot_groups: azure-ai-model-catalog-samples-chat -# How to deploy AI21's Jamba-Instruct model with Azure AI Studio +# How to use Jamba-Instruct chat models + +In this article, you learn about Jamba-Instruct chat models and how to use them. +The Jamba-Instruct model is AI21's production-grade Mamba-based large language model (LLM) which uses AI21's hybrid Mamba-Transformer architecture. It's an instruction-tuned version of AI21's hybrid structured state space model (SSM) transformer Jamba model. The Jamba-Instruct model is built for reliable commercial use with respect to quality and performance. + +> [!TIP] +> See our announcements of AI21's Jamba-Instruct model available now on Azure AI Model Catalog through [AI21's blog](https://aka.ms/ai21-jamba-instruct-blog) and [Microsoft Tech Community Blog](https://aka.ms/ai21-jamba-instruct-announcement). ++++++ +You can learn more about the models in their respective model card: +* [AI21-Jamba-Instruct](https://aka.ms/azureai/landing/AI21-Jamba-Instruct) -In this article, you learn how to use Azure AI Studio to deploy AI21's Jamba-Instruct model as a serverless API with pay-as-you-go billing. -The Jamba Instruct model is AI21's production-grade Mamba-based large language model (LLM) which leverages AI21's hybrid Mamba-Transformer architecture. It's an instruction-tuned version of AI21's hybrid structured state space model (SSM) transformer Jamba model. The Jamba Instruct model is built for reliable commercial use with respect to quality and performance. +## Prerequisites -## Deploy the Jamba Instruct model as a serverless API +To use Jamba-Instruct chat models with Azure AI Studio, you need the following prerequisites: -Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing, providing a way to consume them as an API without hosting them on your subscription, while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription. +### A model deployment -The [AI21-Jamba-Instruct model](https://aka.ms/aistudio/landing/ai21-labs-jamba-instruct) deployed as a serverless API with pay-as-you-go billing is [offered by AI21 through Microsoft Azure Marketplace](https://aka.ms/azure-marketplace-offer-ai21-jamba-instruct). AI21 can change or update the terms of use and pricing of this model. +Deployment to serverless APIs -To get started with Jamba Instruct deployed as a serverless API, explore our integrations with [LangChain](https://aka.ms/ai21-jamba-instruct-langchain-sample), [LiteLLM](https://aka.ms/ai21-jamba-instruct-litellm-sample), [OpenAI](https://aka.ms/ai21-jamba-instruct-openai-sample) and the [Azure API](https://aka.ms/ai21-jamba-instruct-azure-api-sample). +Jamba-Instruct chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. > [!TIP] -> See our announcements of AI21's Jamba-Instruct model available now on Azure AI Model Catalog through [AI21's blog](https://aka.ms/ai21-jamba-instruct-blog) and [Microsoft Tech Community Blog](https://aka.ms/ai21-jamba-instruct-announcement). +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jamba-Instruct chat models. +### Create a client to consume the model -### Prerequisites +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. -- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for Jamba Instruct is only available with hubs created in these regions: - * East US - * East US 2 - * North Central US - * South Central US - * West US - * West US 3 - * Sweden Central - - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). -- An Azure [AI Studio project](../how-to/create-projects.md).-- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions: +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential - - On the Azure subscriptionΓÇöto subscribe the AI Studio project to the Azure Marketplace offering, once for each project, per offering: - - `Microsoft.MarketplaceOrdering/agreements/offers/plans/read` - - `Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action` - - `Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read` - - `Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read` - - `Microsoft.SaaS/register/action` - - - On the resource groupΓÇöto create and use the SaaS resource: - - `Microsoft.SaaS/resources/read` - - `Microsoft.SaaS/resources/write` - - - On the AI Studio projectΓÇöto deploy endpoints (the Azure AI Developer role contains these permissions already): - - `Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/` - - `Microsoft.MachineLearningServices/workspaces/serverlessEndpoints/` +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` - For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md). +### Get the model's capabilities -### Create a new deployment +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: -These steps demonstrate the deployment of AI21-Jamba-Instruct. To create a deployment: -1. Sign in to [Azure AI Studio](https://ai.azure.com). -1. Select Model catalog from the left sidebar. -1. Search for and select AI21-Jamba-Instruct to open its Details page. -1. Select Deploy to open a serverless API deployment window for the model. -1. Alternatively, you can initiate a deployment by starting from your project in AI Studio. - 1. From the left sidebar of your project, select Components > Deployments. - 1. Select + Create deployment. - 1. Search for and select AI21-Jamba-Instruct. to open the Model's Details page. - 1. Select Confirm to open a serverless API deployment window for the model. -1. Select the project in which you want to deploy your model. To deploy the AI21-Jamba-Instruct model, your project must be in one of the regions listed in the [Prerequisites](#prerequisites) section. -1. In the deployment wizard, select the link to Azure Marketplace Terms, to learn more about the terms of use. -1. Select the Pricing and terms tab to learn about pricing for the selected model. -1. Select the Subscribe and Deploy button. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering. This step requires that your account has the Azure subscription permissions and resource group permissions listed in the [Prerequisites](#prerequisites). Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. Currently, you can have only one deployment for each model within a project. -1. Once you subscribe the project for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ project don't require subscribing again. If this scenario applies to you, there's a Continue to deploy option to select. -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. -1. Return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key. For more information on using the APIs, see the [Reference](#reference-for-jamba-instruct-deployed-as-a-serverless-api) section. -1. You can always find the endpoint's details, URL, and access keys by navigating to your Project overview page. Then, from the left sidebar of your project, select Components > Deployments. +```python +model_info = client.get_model_info() +``` + +The response is as follows: -To learn about billing for the AI21-Jamba-Instruct model deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Jamba Instruct deployed as a serverless API](#cost-and-quota-considerations-for-jamba-instruct-deployed-as-a-serverless-api). -### Consume Jamba Instruct as a serverless API +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` -You can consume Jamba Instruct models as follows: +```console +Model name: AI21-Jamba-Instruct +Model type: chat-completions +Model provider name: AI21 +``` -1. From your Project overview page, go to the left sidebar and select Components > Deployments. +### Create a chat completion request -1. Find and select the deployment you created. +The following example shows how you can create a basic chat completions request to the model. -1. Copy the Target URL and the Key value. +```python +from azure.ai.inference.models import SystemMessage, UserMessage -1. Make an API request. +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` -For more information on using the APIs, see the [reference](#reference-for-jamba-instruct-deployed-as-a-serverless-api) section. +The response is as follows, where you can see the model's usage statistics: -## Reference for Jamba Instruct deployed as a serverless API -Jamba Instruct models accept both of these APIs: +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` -- The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` for multi-turn chat or single-turn question-answering. This API is supported because Jamba Instruct is fine-tuned for chat completion.-- [AI21's Azure Client](https://docs.ai21.com/reference/jamba-instruct-api). For more information about the REST endpoint being called, visit [AI21's REST documentation](https://docs.ai21.com/reference/jamba-instruct-api). +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: AI21-Jamba-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` -### Azure AI model inference API +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. -The [Azure AI model inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started). +#### Stream content -Single-turn and multi-turn chat have the same request and response format, except that question answering (single-turn) involves only a single user message in the request, while multi-turn chat requires that you send the entire chat message history in each request. +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. -In a multi-turn chat, the message thread has the following attributes: +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. -- Includes all messages from the user and the model, ordered from oldest to newest.-- Messages alternate between `user` and `assistant` role messages-- Optionally, the message thread starts with a system message to provide context. -The following pseudocode is an example of the message stack for the fourth call in a chat request that includes an initial system message. +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` -```json -[ - {"role": "system", "message": "Some contextual information here"}, - {"role": "user", "message": "User message 1"}, - {"role": "assistant", "message": "System response 1"}, - {"role": "user", "message": "User message 2"}, - {"role": "assistant"; "message": "System response 2"}, - {"role": "user", "message": "User message 3"}, - {"role": "assistant", "message": "System response 3"}, - {"role": "user", "message": "User message 4"} -] +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) ``` -### AI21's Azure client +You can visualize how streaming generates content: -Use the method `POST` to send the request to the `/v1/chat/completions` route: -__Request__ +```python +print_stream(result) +``` -```HTTP/1.1 -POST /v1/chat/completions HTTP/1.1 -Host: <DEPLOYMENT_URI> -Authorization: Bearer <TOKEN> -Content-type: application/json +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +> [!WARNING] +> Jamba doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```python +from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage + +try: + response = client.complete( + messages=[ + SystemMessage(content="You are an AI assistant that helps people find information."), + UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), + ] + ) + + print(response.choices[0].message.content) + +except HttpResponseError as ex: + if ex.status_code == 400: + response = ex.response.json() + if isinstance(response, dict) and "error" in response: + print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}") + else: + raise + raise +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++++ +You can learn more about the models in their respective model card: + +* [AI21-Jamba-Instruct](https://aka.ms/azureai/landing/AI21-Jamba-Instruct) ++ +## Prerequisites + +To use Jamba-Instruct chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Jamba-Instruct chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jamba-Instruct chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: AI21-Jamba-Instruct +Model type: chat-completions +Model provider name: AI21 +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: AI21-Jamba-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 ``` -#### Request schema +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. -Payload is a JSON formatted string containing the following parameters: +#### Stream content -\| Key \| Type \| Required/Default \| Allowed values \| Description \| -\| - \| -- \| :--:\| -- \| \| -\| `model` \| `string` \| Y \| Must be `jamba-instruct` \| -\| `messages` \| `list[object]` \| Y \| A list of objects, one per message, from oldest to newest. The oldest message can be role `system`. All later messages must alternate between user and assistant roles. See the message object definition below. \| -\| `max_tokens` \| `integer` \| N <br>`4096` \| 0 ΓÇô 4096 \| The maximum number of tokens to allow for each generated response message. Typically the best way to limit output length is by providing a length limit in the system prompt (for example, "limit your answers to three sentences") \| -\| `temperature` \| `float` \| N <br>`1` \| 0.0 ΓÇô 2.0 \| How much variation to provide in each answer. Setting this value to 0 guarantees the same response to the same question every time. Setting a higher value encourages more variation. Modifies the distribution from which tokens are sampled. We recommend altering this or `top_p`, but not both. \| -\| `top_p` \| `float` \| N <br>`1` \| 0 < _value_ <=1.0 \| Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens. \| -\| `stop` \| `string` OR `list[string]` \| N <br> \| "" \| String or list of strings containing the word(s) where the API should stop generating output. Newlines are allowed as "\n". The returned text won't contain the stop sequence. \| -\| `n` \| `integer` \| N <br>`1` \| 1 ΓÇô 16 \| How many responses to generate for each prompt. With Azure AI Studio's Playground, `n=1` as we work on multi-response Playground. \| -\| `stream` \| `boolean` \| N <br>`False` \| `True` OR `False` \| Whether to enable streaming. If true, results are returned one token at a time. If set to true, `n` must be 1, which is automatically set. \| +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` -The `messages` object has the following fields: - - `role`: [_string, required_] The author or purpose of the message. One of the following values: - - `user`: Input provided by the user. Any instructions given here that conflict with instructions given in the `system` prompt take precedence over the `system` prompt instructions. - - `assistant`: A response generated by the model. - - `system`: Initial instructions to provide general guidance on the tone and voice of the generated message. An initial system message is optional, but recommended to provide guidance on the tone of the chat. For example, "You are a helpful chatbot with a background in earth sciences and a charming French accent." - - `content`: [_string, required_] The content of the message. +To stream completions, use `.asNodeStream()` when you call the model. +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +> [!WARNING] +> Jamba doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```javascript +try { + var messages = [ + { role: "system", content: "You are an AI assistant that helps people find information." }, + { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." }, + ]; + + var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } + }); + + console.log(response.body.choices[0].message.content); +} +catch (error) { + if (error.status_code == 400) { + var response = JSON.parse(error.response._content); + if (response.error) { + console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`); + } + else + { + throw error; + } + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++++ +You can learn more about the models in their respective model card: + +* [AI21-Jamba-Instruct](https://aka.ms/azureai/landing/AI21-Jamba-Instruct) ++ +## Prerequisites + +To use Jamba-Instruct chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Jamba-Instruct chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` -#### Request example +## Work with chat completions -__Single-turn example__ +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. -```JSON +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jamba-Instruct chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: AI21-Jamba-Instruct +Model type: chat-completions +Model provider name: AI21 +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() { - "model": "jamba-instruct", - "messages": [ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: AI21-Jamba-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() { - "role":"user", - "content":"Who was the first emperor of rome?"} - ], - "temperature": 0.8, - "max_tokens": 512 + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); } ``` -__Chat example (fourth request containing third user response)__ +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. -```JSON +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) { - "model": "jamba-instruct", - "messages": [ - {"role": "system", - "content": "You are a helpful genie just released from a bottle. You start the conversation with 'Thank you for freeing me! I grant you one wish.'"}, - {"role":"user", - "content":"I want a new car"}, - {"role":"assistant", - "content":"≡ƒÜù Great choice, I can definitely help you with that! Before I grant your wish, can you tell me what kind of car you're looking for?"}, - {"role":"user", - "content":"A corvette"}, - {"role":"assistant", - "content":"Great choice! What color and year?"}, - {"role":"user", - "content":"1963 black split window Corvette"} - ], - "n":3 + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } } ``` -#### Response schema +You can visualize how streaming generates content: -The response depends slightly on whether the result is streamed or not. -In a _non-streamed result_, all responses are delivered together in a single response, which also includes a `usage` property. +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` -In a _streamed result_, +#### Explore more parameters supported by the inference client -* Each response includes a single token in the `choices` field. -* The `choices` object structure is different. -* Only the last response includes a `usage` object. -* The entire response is wrapped in a `data` object. -* The final response object is `data: [DONE]`. +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). -The response payload is a dictionary with the following fields. +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` -\| Key \| Type \| Description \| -\| \| \| - \| -\| `id` \| `string` \| A unique identifier for the request. \| -\| `model` \| `string` \| Name of the model used. \| -\| `choices` \| `list[object`]\|The model-generated response text. For a non-streaming response it is a list with `n` items. For a streaming response, it is a single object containing a single token. See the object description below. \| -\| `created` \| `integer` \| The Unix timestamp (in seconds) of when the completion was created. \| -\| `object` \| `string` \| The object type, which is always `chat.completion`. \| -\| `usage` \| `object` \| Usage statistics for the completion request. See below for details. \| +> [!WARNING] +> Jamba doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. -The `choices` response object contains the model-generated response. The object has the following fields: +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). -\| Key \| Type \| Description \| -\| \| \| \| -\| `index` \| `integer` \| Zero-based index of the message in the list of messages. Might not correspond to the position in the list. For streamed messages this is always zero. \| -\| `message` OR `delta` \| `object` \| The generated message (or token in a streaming response). Same object type as described in the request with two changes:<br> - In a non-streaming response, this object is called `message`. <br>- In a streaming response, it is called `delta`, and contains either `message` or `role` but never both. \| -\| `finish_reason` \| `string` \| The reason the model stopped generating tokens: <br>- `stop`: The model reached a natural stop point, or a provided stop sequence. <br>- `length`: Max number of tokens have been reached. <br>- `content_filter`: The generated response violated a responsible AI policy. <br>- `null`: Streaming only. In a streaming response, all responses except the last will be `null`. \| +### Pass extra parameters to the model -The `usage` response object contains the following fields. +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. -\| Key \| Type \| Value \| -\| - \| \| -- \| -\| `prompt_tokens` \| `integer` \| Number of tokens in the prompt. Note that the prompt token count includes extra tokens added by the system to format the prompt list into a single string as required by the model. The number of extra tokens is typically proportional to the number of messages in the thread, and should be relatively small. \| -\| `completion_tokens` \| `integer` \| Number of tokens generated in the completion. \| -\| `total_tokens` \| `integer` \| Total tokens. +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. -#### Non-streaming response example -```JSON +```csharp +requestOptions = new ChatCompletionsOptions() { - "id":"cmpl-524c73beb8714d878e18c3b5abd09f2a", - "choices":[ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else { - "index":0, - "message":{ - "role":"assistant", - "content":"The human nose can detect over 1 trillion different scents, making it one of the most sensitive smell organs in the animal kingdom." - }, - "finishReason":"stop" + throw; } - ], - "created": 1717487036, - "usage":{ - "promptTokens":116, - "completionTokens":30, - "totalTokens":146 - } } ``` -#### Streaming response example -```JSON -data: {"id": "cmpl-8e8b2f6556f94714b0cd5cfe3eeb45fc", "choices": [{"index": 0, "delta": {"role": "assistant"}, "created": 1717487336, "finish_reason": null}]} -data: {"id": "cmpl-8e8b2f6556f94714b0cd5cfe3eeb45fc", "choices": [{"index": 0, "delta": {"content": ""}, "created": 1717487336, "finish_reason": null}]} -data: {"id": "cmpl-8e8b2f6556f94714b0cd5cfe3eeb45fc", "choices": [{"index": 0, "delta": {"content": " The"}, "created": 1717487336, "finish_reason": null}]} -data: {"id": "cmpl-8e8b2f6556f94714b0cd5cfe3eeb45fc", "choices": [{"index": 0, "delta": {"content": " first e"}, "created": 1717487336, "finish_reason": null}]} -data: {"id": "cmpl-8e8b2f6556f94714b0cd5cfe3eeb45fc", "choices": [{"index": 0, "delta": {"content": "mpe"}, "created": 1717487336, "finish_reason": null}]} -... 115 responses omitted for sanity ... -data: {"id": "cmpl-8e8b2f6556f94714b0cd5cfe3eeb45fc", "choices": [{"index": 0, "delta": {"content": "me"}, "created": 1717487336, "finish_reason": null}]} -data: {"id": "cmpl-8e8b2f6556f94714b0cd5cfe3eeb45fc", "choices": [{"index": 0, "delta": {"content": "."}, "created": 1717487336,"finish_reason": "stop"}], "usage": {"prompt_tokens": 107, "completion_tokens": 121, "total_tokens": 228}} -data: [DONE] +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++++ +You can learn more about the models in their respective model card: + +* [AI21-Jamba-Instruct](https://aka.ms/azureai/landing/AI21-Jamba-Instruct) ++ +## Prerequisites + +To use Jamba-Instruct chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Jamba-Instruct chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Jamba-Instruct chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json ``` -## Cost and quotas +The response is as follows: -### Cost and quota considerations for Jamba Instruct deployed as a serverless API -The Jamba Instruct model is deployed as a serverless API and is offered by AI21 through Azure Marketplace and integrated with Azure AI studio for use. You can find Azure Marketplace pricing when deploying or fine-tuning models. +```json +{ + "model_name": "AI21-Jamba-Instruct", + "model_type": "chat-completions", + "model_provider_name": "AI21" +} +``` -Each time a workspace subscribes to a given model offering from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently. +### Create a chat completion request -For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). +The following example shows how you can create a basic chat completions request to the model. + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "AI21-Jamba-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` + +You can visualize how streaming generates content: ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "AI21-Jamba-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` + +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "AI21-Jamba-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "AI21-Jamba-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +> [!WARNING] +> Jamba doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are an AI assistant that helps people find information." + }, + { + "role": "user", + "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + } + ] +} +``` ++ +```json +{ + "error": { + "message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.", + "type": null, + "param": "prompt", + "code": "content_filter", + "status": 400 + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++ +## More inference examples + +For more examples of how to use Jamba, see the following examples and tutorials: + +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| + +## Cost and quota considerations for Jamba family of models deployed as serverless API endpoints Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. -## Content filtering +Jamba models deployed as a serverless API are offered by AI21 through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. -Models deployed as a serverless API are protected by Azure AI content safety. With Azure AI content safety enabled, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [Azure AI Content Safety](/azure/ai-services/content-safety/overview). +Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. + +For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). ## Related content -- [What is Azure AI Studio?](../what-is-ai-studio.md)-- [Azure AI FAQ article](../faq.yml)-- [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)+ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Llama	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-llama.md	Title: How to deploy Meta Llama 3.1 models with Azure AI Studio + Title: How to use Meta Llama chat models with Azure AI Studio -description: Learn how to deploy Meta Llama 3.1 models with Azure AI Studio. - +description: Learn how to use Meta Llama chat models with Azure AI Studio. + Previously updated : 7/21/2024 Last updated : 08/08/2024 reviewer: shubhirajMsft -+ +zone_pivot_groups: azure-ai-model-catalog-samples-chat -# How to deploy Meta Llama 3.1 models with Azure AI Studio +# How to use Meta Llama chat models +In this article, you learn about Meta Llama chat models and how to use them. +Meta Llama 2 and 3 models and tools are a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. The model family also includes fine-tuned versions optimized for dialogue use cases with reinforcement learning from human feedback (RLHF). -In this article, you learn about the Meta Llama model family. You also learn how to use Azure AI Studio to deploy models from this set either to serverless APIs with pay-as you go billing or to managed compute. - > [!IMPORTANT] - > Read more about the announcement of Meta Llama 3.1 405B Instruct and other Llama 3.1 models available now on Azure AI Model Catalog: [Microsoft Tech Community Blog](https://aka.ms/meta-llama-3.1-release-on-azure) and from [Meta Announcement Blog](https://aka.ms/meta-llama-3.1-release-announcement). -Now available on Azure AI Models-as-a-Service: -- `Meta-Llama-3.1-405B-Instruct`-- `Meta-Llama-3.1-70B-Instruct`-- `Meta-Llama-3.1-8B-Instruct` -The Meta Llama 3.1 family of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). All models support long context length (128k) and are optimized for inference with support for grouped query attention (GQA). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. +## Meta Llama chat models -See the following GitHub samples to explore integrations with [LangChain](https://aka.ms/meta-llama-3.1-405B-instruct-langchain), [LiteLLM](https://aka.ms/meta-llama-3.1-405B-instruct-litellm), [OpenAI](https://aka.ms/meta-llama-3.1-405B-instruct-openai) and the [Azure API](https://aka.ms/meta-llama-3.1-405B-instruct-webrequests). +The Meta Llama chat models include the following models: -## Deploy Meta Llama 3.1 405B Instruct as a serverless API +# [Meta Llama-3.1](#tab/meta-llama-3-1) -Meta Llama 3.1 models - like `Meta Llama 3.1 405B Instruct` - can be deployed as a serverless API with pay-as-you-go, providing a way to consume them as an API without hosting them on your subscription while keeping the enterprise security and compliance organizations need. This deployment option doesn't require quota from your subscription. Meta Llama 3.1 models are deployed as a serverless API with pay-as-you-go billing through Microsoft Azure Marketplace, and they might add more terms of use and pricing. +The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks. -### Azure Marketplace model offerings -# [Meta Llama 3.1](#tab/llama-three) +The following models are available: -The following models are available in Azure Marketplace for Llama 3.1 and Llama 3 when deployed as a service with pay-as-you-go: +* [Meta-Llama-3.1-405B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-405B-Instruct) +* [Meta-Llama-3.1-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-70B-Instruct) +* [Meta-Llama-3.1-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-8B-Instruct) -* [Meta-Llama-3.1-405B-Instruct (preview)](https://aka.ms/aistudio/landing/meta-llama-3.1-405B-instruct) -* [Meta-Llama-3.1-70B-Instruct (preview)](https://aka.ms/aistudio/landing/meta-llama-3.1-70B-instruct) -* [Meta Llama-3.1-8B-Instruct (preview)](https://aka.ms/aistudio/landing/meta-llama-3.1-8B-instruct) -* [Meta-Llama-3-70B-Instruct (preview)](https://aka.ms/aistudio/landing/meta-llama-3-70b-chat) -* [Meta-Llama-3-8B-Instruct (preview)](https://aka.ms/aistudio/landing/meta-llama-3-8b-chat) -# [Meta Llama 2](#tab/llama-two) +# [Meta Llama-3](#tab/meta-llama-3) + +Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. ++ +The following models are available: + +* [Meta-Llama-3-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-70B-Instruct) +* [Meta-Llama-3-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-8B-Instruct) ++ +# [Meta Llama-2](#tab/meta-llama-2) + +Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. ++ +The following models are available: + +* [Llama-2-70b-chat](https://aka.ms/azureai/landing/Llama-2-70b-chat) +* [Llama-2-13b-chat](https://aka.ms/azureai/landing/Llama-2-13b-chat) +* [Llama-2-7b-chat](https://aka.ms/azureai/landing/Llama-2-7b-chat) -The following models are available in Azure Marketplace for Llama 2 when deployed as a serverless API: -* Meta Llama-2-7B (preview) -* Meta Llama 2 7B-Chat (preview) -* Meta Llama-2-13B (preview) -* Meta Llama 2 13B-Chat (preview) -* Meta Llama-2-70B (preview) -* Meta Llama 2 70B-Chat (preview) - -If you need to deploy a different model, [deploy it to managed compute](#deploy-meta-llama-models-to-managed-compute) instead. +## Prerequisites -### Prerequisites +To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites: -# [Meta Llama 3](#tab/llama-three) +### A model deployment -- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for Meta Llama 3.1 and Llama 3 is only available with hubs created in these regions: +Deployment to serverless APIs - * East US - * East US 2 - * North Central US - * South Central US - * West US - * West US 3 - * Sweden Central - - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). -- An [AI Studio project](../how-to/create-projects.md) in Azure AI Studio.-- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions:- - - On the Azure subscriptionΓÇöto subscribe the AI Studio project to the Azure Marketplace offering, once for each project, per offering: - - `Microsoft.MarketplaceOrdering/agreements/offers/plans/read` - - `Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action` - - `Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read` - - `Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read` - - `Microsoft.SaaS/register/action` - - - On the resource groupΓÇöto create and use the SaaS resource: - - `Microsoft.SaaS/resources/read` - - `Microsoft.SaaS/resources/write` - - - On the AI Studio projectΓÇöto deploy endpoints (the Azure AI Developer role contains these permissions already): - - `Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/` - - `Microsoft.MachineLearningServices/workspaces/serverlessEndpoints/` - - For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md). - -# [Meta Llama 2](#tab/llama-two) +Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) -- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for Meta Llama 2 is only available with hubs created in these regions: +### The inference package installed - * East US - * East US 2 - * North Central US - * South Central US - * West US - * West US 3 +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). -- An [AI Studio project](../how-to/create-projects.md) in Azure AI Studio.-- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __owner__ or __contributor__ role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions:- - - On the Azure subscriptionΓÇöto subscribe the AI Studio project to the Azure Marketplace offering, once for each project, per offering: - - `Microsoft.MarketplaceOrdering/agreements/offers/plans/read` - - `Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action` - - `Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read` - - `Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read` - - `Microsoft.SaaS/register/action` - - - On the resource groupΓÇöto create and use the SaaS resource: - - `Microsoft.SaaS/resources/read` - - `Microsoft.SaaS/resources/write` - - - On the AI Studio projectΓÇöto deploy endpoints (the Azure AI Developer role contains these permissions already): - - `Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/` - - `Microsoft.MachineLearningServices/workspaces/serverlessEndpoints/` - - For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md). +Once you have these prerequisites, install the Azure AI inference package with the following command: - +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models. -### Create a new deployment +### Create a client to consume the model -# [Meta Llama 3](#tab/llama-three) +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. -To create a deployment: -1. Sign in to [Azure AI Studio](https://ai.azure.com). -1. Choose `Meta-Llama-3.1-405B-Instruct` deploy from the Azure AI Studio [model catalog](https://ai.azure.com/explore/models). +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential - Alternatively, you can initiate deployment by starting from your project in AI Studio. Select a project and then select Deployments > + Create. +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` -1. On the Details page for `Meta-Llama-3.1-405B-Instruct`, select Deploy and then select Serverless API with Azure AI Content Safety. +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. -1. Select the project in which you want to deploy your models. To use the pay-as-you-go model deployment offering, your workspace must belong to the East US 2 or Sweden Central region. -1. On the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. You can also select the Marketplace offer details tab to learn about pricing for the selected model. -1. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering (for example, `Meta-Llama-3.1-405B-Instruct`) from Azure Marketplace. This step requires that your account has the Azure subscription permissions and resource group permissions listed in the prerequisites. Each project has its own subscription to the particular Azure Marketplace offering, which allows you to control and monitor spending. Select Subscribe and Deploy. - > [!NOTE] - > Subscribing a project to a particular Azure Marketplace offering (in this case, Meta-Llama-3-70B) requires that your account has Contributor or Owner access at the subscription level where the project is created. Alternatively, your user account can be assigned a custom role that has the Azure subscription permissions and resource group permissions listed in the [prerequisites](#prerequisites). +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.identity import DefaultAzureCredential -1. Once you sign up the project for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ project don't require subscribing again. Therefore, you don't need to have the subscription-level permissions for subsequent deployments. If this scenario applies to you, select Continue to deploy. +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=DefaultAzureCredential(), +) +``` -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. +### Get the model's capabilities -1. Select Open in playground to start interacting with the model. +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: -1. You can return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key, which you can use to call the deployment and generate completions. -1. You can always find the endpoint's details, URL, and access keys by navigating to the project page and selecting Deployments from the left menu. +```python +model_info = client.get_model_info() +``` -To learn about billing for Meta Llama models deployed with pay-as-you-go, see [Cost and quota considerations for Llama 3 models deployed as a service](#cost-and-quota-considerations-for-meta-llama-31-models-deployed-as-a-service). +The response is as follows: -# [Meta Llama 2](#tab/llama-two) -To create a deployment: +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: Meta-Llama-3.1-405B-Instruct +Model type: chat-completions +Model provider name: Meta +``` -1. Sign in to [Azure AI Studio](https://ai.azure.com). -1. Choose the model you want to deploy from the Azure AI Studio [model catalog](https://ai.azure.com/explore/models). +### Create a chat completion request - Alternatively, you can initiate deployment by starting from your project in AI Studio. Select a project and then select Deployments > + Create. +The following example shows how you can create a basic chat completions request to the model. -1. On the model's Details page, select Deploy and then select Serverless API with Azure AI Content Safety. +```python +from azure.ai.inference.models import SystemMessage, UserMessage - :::image type="content" source="../media/deploy-monitor/llama/deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the pay-as-you-go option." lightbox="../media/deploy-monitor/llama/deploy-pay-as-you-go.png"::: +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` -1. Select the project in which you want to deploy your models. To use the pay-as-you-go model deployment offering, your workspace must belong to the East US 2 or West US 3 region. -1. On the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. You can also select the Marketplace offer details tab to learn about pricing for the selected model. -1. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering (for example, Meta-Llama-2-7B) from Azure Marketplace. This step requires that your account has the Azure subscription permissions and resource group permissions listed in the prerequisites. Each project has its own subscription to the particular Azure Marketplace offering, which allows you to control and monitor spending. Select Subscribe and Deploy. +The response is as follows, where you can see the model's usage statistics: - > [!NOTE] - > Subscribing a project to a particular Azure Marketplace offering (in this case, Meta-Llama-2-7B) requires that your account has Contributor or Owner access at the subscription level where the project is created. Alternatively, your user account can be assigned a custom role that has the Azure subscription permissions and resource group permissions listed in the [prerequisites](#prerequisites). - :::image type="content" source="../media/deploy-monitor/llama/deploy-marketplace-terms.png" alt-text="A screenshot showing the terms and conditions of a given model." lightbox="../media/deploy-monitor/llama/deploy-marketplace-terms.png"::: +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` -1. Once you sign up the project for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ project don't require subscribing again. Therefore, you don't need to have the subscription-level permissions for subsequent deployments. If this scenario applies to you, select Continue to deploy. +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Meta-Llama-3.1-405B-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` - :::image type="content" source="../media/deploy-monitor/llama/deploy-pay-as-you-go-project.png" alt-text="A screenshot showing a project that is already subscribed to the offering." lightbox="../media/deploy-monitor/llama/deploy-pay-as-you-go-project.png"::: +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. +#### Stream content - :::image type="content" source="../media/deploy-monitor/llama/deployment-name.png" alt-text="A screenshot showing how to indicate the name of the deployment you want to create." lightbox="../media/deploy-monitor/llama/deployment-name.png"::: +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. -1. Select Open in playground to start interacting with the model. -1. You can return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key, which you can use to call the deployment and generate completions. -1. You can always find the endpoint's details, URL, and access keys by navigating to your project and selecting Deployments from the left menu. +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. -To learn about billing for Llama models deployed with pay-as-you-go, see [Cost and quota considerations for Llama 3 models deployed as a service](#cost-and-quota-considerations-for-meta-llama-31-models-deployed-as-a-service). - +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +> [!WARNING] +> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). -### Consume Meta Llama models as a service +### Pass extra parameters to the model -# [Meta Llama 3](#tab/llama-three) +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. -Models deployed as a service can be consumed using either the chat or the completions API, depending on the type of model you deployed. +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. -1. Select your project or hub and then select Deployments from the left menu. -1. Find and select the `Meta-Llama-3.1-405B-Instruct` deployment you created. +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +The following extra parameters can be passed to Meta Llama chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `n` \| How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `best_of` \| Generates best_of completions server-side and returns the best (the one with the lowest log probability per token). Results can't be streamed. When used with `n`, best_of controls the number of candidate completions and n specifies how many to returnΓÇöbest_of must be greater than `n`. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `logprobs` \| A number indicating to include the log probabilities on the logprobs most likely tokens and the chosen tokens. For example, if logprobs is 10, the API returns a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there might be up to logprobs+1 elements in the response. \| `integer` \| +\| `ignore_eos` \| Whether to ignore the `EOS` token and continue generating tokens after the `EOS` token is generated. \| `boolean` \| +\| `use_beam_search` \| Whether to use beam search instead of sampling. In such case, `best_of` must be greater than 1 and temperature must be 0. \| `boolean` \| +\| `stop_token_ids` \| List of IDs for tokens that, when generated, stop further token generation. The returned output contains the stop tokens unless the stop tokens are special tokens. \| `array` \| +\| `skip_special_tokens` \| Whether to skip special tokens in the output. \| `boolean` \| ++ +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```python +from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage + +try: + response = client.complete( + messages=[ + SystemMessage(content="You are an AI assistant that helps people find information."), + UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), + ] + ) + + print(response.choices[0].message.content) + +except HttpResponseError as ex: + if ex.status_code == 400: + response = ex.response.json() + if isinstance(response, dict) and "error" in response: + print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}") + else: + raise + raise +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. +++ -1. Select Open in playground. +## Meta Llama chat models -1. Select View code and copy the Endpoint URL and the Key value. +The Meta Llama chat models include the following models: -1. Make an API request based on the type of model you deployed. +# [Meta Llama-3.1](#tab/meta-llama-3-1) - - For completions models, such as `Meta-Llama-3-8B`, use the [`/completions`](#completions-api) API. - - For chat models, such as `Meta-Llama-3.1-405B-Instruct`, use the [`/chat/completions`](#chat-api) API. +The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks. - For more information on using the APIs, see the [reference](#reference-for-meta-llama-31-models-deployed-as-a-service) section. -# [Meta Llama 2](#tab/llama-two) +The following models are available: +* [Meta-Llama-3.1-405B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-405B-Instruct) +* [Meta-Llama-3.1-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-70B-Instruct) +* [Meta-Llama-3.1-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-8B-Instruct) -Models deployed as a service can be consumed using either the chat or the completions API, depending on the type of model you deployed. -1. Select your project or hub and then select Deployments from the left menu. +# [Meta Llama-3](#tab/meta-llama-3) -1. Find and select the deployment you created. +Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. -1. Select Open in playground. -1. Select View code and copy the Endpoint URL and the Key value. +The following models are available: -1. Make an API request based on the type of model you deployed. +* [Meta-Llama-3-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-70B-Instruct) +* [Meta-Llama-3-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-8B-Instruct) - - For completions models, such as `Meta-Llama-2-7B`, use the [`/v1/completions`](#completions-api) API or the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/completions`. - - For chat models, such as `Meta-Llama-2-7B-Chat`, use the [`/v1/chat/completions`](#chat-api) API or the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions`. - For more information on using the APIs, see the [reference](#reference-for-meta-llama-31-models-deployed-as-a-service) section. +# [Meta Llama-2](#tab/meta-llama-2) + +Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. ++ +The following models are available: + +* [Llama-2-70b-chat](https://aka.ms/azureai/landing/Llama-2-70b-chat) +* [Llama-2-13b-chat](https://aka.ms/azureai/landing/Llama-2-13b-chat) +* [Llama-2-7b-chat](https://aka.ms/azureai/landing/Llama-2-7b-chat) + -### Reference for Meta Llama 3.1 models deployed as a service +## Prerequisites -Llama models accept both the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` or a [Llama Chat API](#chat-api) on `/v1/chat/completions`. In the same way, text completions can be generated using the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/completions` or a [Llama Completions API](#completions-api) on `/v1/completions` +To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites: -The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started). +### A model deployment -#### Completions API +Deployment to serverless APIs -Use the method `POST` to send the request to the `/v1/completions` route: +Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. -__Request__ +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). -```rest -POST /v1/completions HTTP/1.1 -Host: <DEPLOYMENT_URI> -Authorization: Bearer <TOKEN> -Content-type: application/json +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference ``` -#### Request schema +## Work with chat completions -Payload is a JSON formatted string containing the following parameters: +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. -\| Key \| Type \| Default \| Description \| -\|\|--\|\|-\| -\| `prompt` \| `string` \| No default. This value must be specified. \| The prompt to send to the model. \| -\| `stream` \| `boolean` \| `False` \| Streaming allows the generated tokens to be sent as data-only server-sent events whenever they become available. \| -\| `max_tokens` \| `integer` \| `16` \| The maximum number of tokens to generate in the completion. The token count of your prompt plus `max_tokens` can't exceed the model's context length. \| -\| `top_p` \| `float` \| `1` \| An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering `top_p` or `temperature`, but not both. \| -\| `temperature` \| `float` \| `1` \| The sampling temperature to use, between 0 and 2. Higher values mean the model samples more broadly the distribution of tokens. Zero means greedy sampling. We recommend altering this or `top_p`, but not both. \| -\| `n` \| `integer` \| `1` \| How many completions to generate for each prompt. <br>Note: Because this parameter generates many completions, it can quickly consume your token quota. \| -\| `stop` \| `array` \| `null` \| String or a list of strings containing the word where the API stops generating further tokens. The returned text won't contain the stop sequence. \| -\| `best_of` \| `integer` \| `1` \| Generates `best_of` completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with `n`, `best_of` controls the number of candidate completions and `n` specifies how many to returnΓÇô`best_of` must be greater than `n`. <br>Note: Because this parameter generates many completions, it can quickly consume your token quota.\| -\| `logprobs` \| `integer` \| `null` \| A number indicating to include the log probabilities on the `logprobs` most likely tokens and the chosen tokens. For example, if `logprobs` is 10, the API returns a list of the 10 most likely tokens. the API always returns the logprob of the sampled token, so there might be up to `logprobs`+1 elements in the response. \| -\| `presence_penalty` \| `float` \| `null` \| Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. \| -\| `ignore_eos` \| `boolean` \| `True` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| -\| `use_beam_search` \| `boolean` \| `False` \| Whether to use beam search instead of sampling. In such case, `best_of` must be greater than `1` and `temperature` must be `0`. \| -\| `stop_token_ids` \| `array` \| `null` \| List of IDs for tokens that, when generated, stop further token generation. The returned output contains the stop tokens unless the stop tokens are special tokens. \| -\| `skip_special_tokens` \| `boolean` \| `null` \| Whether to skip special tokens in the output. \| +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models. -#### Example +### Create a client to consume the model -__Body__ +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. -```json -{ - "prompt": "What's the distance to the moon?", - "temperature": 0.8, - "max_tokens": 512 + +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { DefaultAzureCredential } from "@azure/identity"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new DefaultAzureCredential() +); +``` + +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: Meta-Llama-3.1-405B-Instruct +Model type: chat-completions +Model provider name: Meta +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Meta-Llama-3.1-405B-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } } ``` -#### Response schema +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +> [!WARNING] +> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +The following extra parameters can be passed to Meta Llama chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `n` \| How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `best_of` \| Generates best_of completions server-side and returns the best (the one with the lowest log probability per token). Results can't be streamed. When used with `n`, best_of controls the number of candidate completions and n specifies how many to returnΓÇöbest_of must be greater than `n`. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `logprobs` \| A number indicating to include the log probabilities on the logprobs most likely tokens and the chosen tokens. For example, if logprobs is 10, the API returns a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there might be up to logprobs+1 elements in the response. \| `integer` \| +\| `ignore_eos` \| Whether to ignore the `EOS` token and continue generating tokens after the `EOS` token is generated. \| `boolean` \| +\| `use_beam_search` \| Whether to use beam search instead of sampling. In such case, `best_of` must be greater than 1 and temperature must be 0. \| `boolean` \| +\| `stop_token_ids` \| List of IDs for tokens that, when generated, stop further token generation. The returned output contains the stop tokens unless the stop tokens are special tokens. \| `array` \| +\| `skip_special_tokens` \| Whether to skip special tokens in the output. \| `boolean` \| ++ +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. + -The response payload is a dictionary with the following fields. +```javascript +try { + var messages = [ + { role: "system", content: "You are an AI assistant that helps people find information." }, + { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." }, + ]; -\| Key \| Type \| Description \| -\|--\|--\|--\| -\| `id` \| `string` \| A unique identifier for the completion. \| -\| `choices` \| `array` \| The list of completion choices the model generated for the input prompt. \| -\| `created` \| `integer` \| The Unix timestamp (in seconds) of when the completion was created. \| -\| `model` \| `string` \| The model_id used for completion. \| -\| `object` \| `string` \| The object type, which is always `text_completion`. \| -\| `usage` \| `object` \| Usage statistics for the completion request. \| + var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } + }); + + console.log(response.body.choices[0].message.content); +} +catch (error) { + if (error.status_code == 400) { + var response = JSON.parse(error.response._content); + if (response.error) { + console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`); + } + else + { + throw error; + } + } +} +``` > [!TIP] -> In the streaming mode, for each chunk of response, `finish_reason` is always `null`, except from the last one which is terminated by a payload `[DONE]`. +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. -The `choices` object is a dictionary with the following fields. -\| Key \| Type \| Description \| -\|\|--\|\| -\| `index` \| `integer` \| Choice index. When `best_of` > 1, the index in this array might not be in order and might not be 0 to n-1. \| -\| `text` \| `string` \| Completion result. \| -\| `finish_reason` \| `string` \| The reason the model stopped generating tokens: <br>- `stop`: model hit a natural stop point, or a provided stop sequence. <br>- `length`: if max number of tokens have been reached. <br>- `content_filter`: When RAI moderates and CMP forces moderation. <br>- `content_filter_error`: an error during moderation and wasn't able to make decision on the response. <br>- `null`: API response still in progress or incomplete. \| -\| `logprobs` \| `object` \| The log probabilities of the generated tokens in the output text. \| -The `usage` object is a dictionary with the following fields. -\| Key \| Type \| Value \| -\|\|--\|--\| -\| `prompt_tokens` \| `integer` \| Number of tokens in the prompt. \| -\| `completion_tokens` \| `integer` \| Number of tokens generated in the completion. \| -\| `total_tokens` \| `integer` \| Total tokens. \| +## Meta Llama chat models -The `logprobs` object is a dictionary with the following fields: +The Meta Llama chat models include the following models: -\| Key \| Type \| Value \| -\|\|-\|-\| -\| `text_offsets` \| `array` of `integers` \| The position or index of each token in the completion output. \| -\| `token_logprobs` \| `array` of `float` \| Selected `logprobs` from dictionary in `top_logprobs` array. \| -\| `tokens` \| `array` of `string` \| Selected tokens. \| -\| `top_logprobs` \| `array` of `dictionary` \| Array of dictionary. In each dictionary, the key is the token and the value is the prob. \| +# [Meta Llama-3.1](#tab/meta-llama-3-1) -#### Example +The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks. -```json + +The following models are available: + +* [Meta-Llama-3.1-405B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-405B-Instruct) +* [Meta-Llama-3.1-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-70B-Instruct) +* [Meta-Llama-3.1-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-8B-Instruct) ++ +# [Meta Llama-3](#tab/meta-llama-3) + +Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. ++ +The following models are available: + +* [Meta-Llama-3-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-70B-Instruct) +* [Meta-Llama-3-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-8B-Instruct) ++ +# [Meta Llama-2](#tab/meta-llama-2) + +Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. ++ +The following models are available: + +* [Llama-2-70b-chat](https://aka.ms/azureai/landing/Llama-2-70b-chat) +* [Llama-2-13b-chat](https://aka.ms/azureai/landing/Llama-2-13b-chat) +* [Llama-2-7b-chat](https://aka.ms/azureai/landing/Llama-2-7b-chat) ++++ +## Prerequisites + +To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```csharp +client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new DefaultAzureCredential(includeInteractiveCredentials: true) +); +``` + +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: Meta-Llama-3.1-405B-Instruct +Model type: chat-completions +Model provider name: Meta +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() { - "id": "12345678-1234-1234-1234-abcdefghijkl", - "object": "text_completion", - "created": 217877, - "choices": [ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Meta-Llama-3.1-405B-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` + +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. + +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) { - "index": 0, - "text": "The Moon is an average of 238,855 miles away from Earth, which is about 30 Earths away.", - "logprobs": null, - "finish_reason": "stop" + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); } - ], - "usage": { - "prompt_tokens": 7, - "total_tokens": 23, - "completion_tokens": 16 + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` + +You can visualize how streaming generates content: ++ +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +> [!WARNING] +> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +The following extra parameters can be passed to Meta Llama chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `n` \| How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `best_of` \| Generates best_of completions server-side and returns the best (the one with the lowest log probability per token). Results can't be streamed. When used with `n`, best_of controls the number of candidate completions and n specifies how many to returnΓÇöbest_of must be greater than `n`. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `logprobs` \| A number indicating to include the log probabilities on the logprobs most likely tokens and the chosen tokens. For example, if logprobs is 10, the API returns a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there might be up to logprobs+1 elements in the response. \| `integer` \| +\| `ignore_eos` \| Whether to ignore the `EOS` token and continue generating tokens after the `EOS` token is generated. \| `boolean` \| +\| `use_beam_search` \| Whether to use beam search instead of sampling. In such case, `best_of` must be greater than 1 and temperature must be 0. \| `boolean` \| +\| `stop_token_ids` \| List of IDs for tokens that, when generated, stop further token generation. The returned output contains the stop tokens unless the stop tokens are special tokens. \| `array` \| +\| `skip_special_tokens` \| Whether to skip special tokens in the output. \| `boolean` \| ++ +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else + { + throw; } } ``` -#### Chat API +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. ++++ +## Meta Llama chat models + +The Meta Llama chat models include the following models: + +# [Meta Llama-3.1](#tab/meta-llama-3-1) + +The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open-source and closed chat models on common industry benchmarks. ++ +The following models are available: -Use the method `POST` to send the request to the `/v1/chat/completions` route: +* [Meta-Llama-3.1-405B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-405B-Instruct) +* [Meta-Llama-3.1-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-70B-Instruct) +* [Meta-Llama-3.1-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3.1-8B-Instruct) -__Request__ -```rest -POST /v1/chat/completions HTTP/1.1 -Host: <DEPLOYMENT_URI> +# [Meta Llama-3](#tab/meta-llama-3) + +Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B, and 405B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open-source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety. ++ +The following models are available: + +* [Meta-Llama-3-70B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-70B-Instruct) +* [Meta-Llama-3-8B-Instruct](https://aka.ms/azureai/landing/Meta-Llama-3-8B-Instruct) ++ +# [Meta Llama-2](#tab/meta-llama-2) + +Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. ++ +The following models are available: + +* [Llama-2-70b-chat](https://aka.ms/azureai/landing/Llama-2-70b-chat) +* [Llama-2-13b-chat](https://aka.ms/azureai/landing/Llama-2-13b-chat) +* [Llama-2-7b-chat](https://aka.ms/azureai/landing/Llama-2-7b-chat) ++++ +## Prerequisites + +To use Meta Llama chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Meta Llama chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Meta Llama chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Meta Llama chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. + +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> Authorization: Bearer <TOKEN> -Content-type: application/json +Content-Type: application/json ``` -#### Request schema +The response is as follows: -Payload is a JSON formatted string containing the following parameters: -\| Key \| Type \| Default \| Description \| -\|--\|--\|--\|--\| -\| `messages` \| `string` \| No default. This value must be specified. \| The message or history of messages to use to prompt the model. \| -\| `stream` \| `boolean` \| `False` \| Streaming allows the generated tokens to be sent as data-only server-sent events whenever they become available. \| -\| `max_tokens` \| `integer` \| `16` \| The maximum number of tokens to generate in the completion. The token count of your prompt plus `max_tokens` can't exceed the model's context length. \| -\| `top_p` \| `float` \| `1` \| An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering `top_p` or `temperature`, but not both. \| -\| `temperature` \| `float` \| `1` \| The sampling temperature to use, between 0 and 2. Higher values mean the model samples more broadly the distribution of tokens. Zero means greedy sampling. We recommend altering this or `top_p`, but not both. \| -\| `n` \| `integer` \| `1` \| How many completions to generate for each prompt. <br>Note: Because this parameter generates many completions, it can quickly consume your token quota. \| -\| `stop` \| `array` \| `null` \| String or a list of strings containing the word where the API stops generating further tokens. The returned text won't contain the stop sequence. \| -\| `best_of` \| `integer` \| `1` \| Generates `best_of` completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with `n`, `best_of` controls the number of candidate completions and `n` specifies how many to returnΓÇö`best_of` must be greater than `n`. <br>Note: Because this parameter generates many completions, it can quickly consume your token quota.\| -\| `logprobs` \| `integer` \| `null` \| A number indicating to include the log probabilities on the `logprobs` most likely tokens and the chosen tokens. For example, if `logprobs` is 10, the API returns a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there might be up to `logprobs`+1 elements in the response. \| -\| `presence_penalty` \| `float` \| `null` \| Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. \| -\| `ignore_eos` \| `boolean` \| `True` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| -\| `use_beam_search` \| `boolean` \| `False` \| Whether to use beam search instead of sampling. In such case, `best_of` must be greater than `1` and `temperature` must be `0`. \| -\| `stop_token_ids` \| `array` \| `null` \| List of IDs for tokens that, when generated, stop further token generation. The returned output contains the stop tokens unless the stop tokens are special tokens.\| -\| `skip_special_tokens` \| `boolean` \| `null` \| Whether to skip special tokens in the output. \| +```json +{ + "model_name": "Meta-Llama-3.1-405B-Instruct", + "model_type": "chat-completions", + "model_provider_name": "Meta" +} +``` -The `messages` object has the following fields: +### Create a chat completion request -\| Key \| Type \| Value \| -\|--\|--\|\| -\| `content` \| `string` \| The contents of the message. Content is required for all messages. \| -\| `role` \| `string` \| The role of the message's author. One of `system`, `user`, or `assistant`. \| +The following example shows how you can create a basic chat completions request to the model. +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` -#### Example +The response is as follows, where you can see the model's usage statistics: -__Body__ ```json { - "messages": - [ - { - "role": "system", - "content": "You are a helpful assistant that translates English to Italian."}, + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Meta-Llama-3.1-405B-Instruct", + "choices": [ { - "role": "user", - "content": "Translate the following sentence from English to Italian: I love programming." + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null } ], - "temperature": 0.8, - "max_tokens": 512, + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } } ``` -#### Response schema +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. -The response payload is a dictionary with the following fields. +#### Stream content -\| Key \| Type \| Description \| -\|--\|--\|-\| -\| `id` \| `string` \| A unique identifier for the completion. \| -\| `choices` \| `array` \| The list of completion choices the model generated for the input messages. \| -\| `created` \| `integer` \| The Unix timestamp (in seconds) of when the completion was created. \| -\| `model` \| `string` \| The model_id used for completion. \| -\| `object` \| `string` \| The object type, which is always `chat.completion`. \| -\| `usage` \| `object` \| Usage statistics for the completion request. \| +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. -> [!TIP] -> In the streaming mode, for each chunk of response, `finish_reason` is always `null`, except from the last one which is terminated by a payload `[DONE]`. In each `choices` object, the key for `messages` is changed by `delta`. +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. -The `choices` object is a dictionary with the following fields. +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` -\| Key \| Type \| Description \| -\|\|--\|--\| -\| `index` \| `integer` \| Choice index. When `best_of` > 1, the index in this array might not be in order and might not be `0` to `n-1`. \| -\| `messages` or `delta` \| `string` \| Chat completion result in `messages` object. When streaming mode is used, `delta` key is used. \| -\| `finish_reason` \| `string` \| The reason the model stopped generating tokens: <br>- `stop`: model hit a natural stop point or a provided stop sequence. <br>- `length`: if max number of tokens have been reached. <br>- `content_filter`: When RAI moderates and CMP forces moderation <br>- `content_filter_error`: an error during moderation and wasn't able to make decision on the response <br>- `null`: API response still in progress or incomplete.\| -\| `logprobs` \| `object` \| The log probabilities of the generated tokens in the output text. \| +You can visualize how streaming generates content: -The `usage` object is a dictionary with the following fields. +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Meta-Llama-3.1-405B-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` + +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. + -\| Key \| Type \| Value \| -\|\|--\|--\| -\| `prompt_tokens` \| `integer` \| Number of tokens in the prompt. \| -\| `completion_tokens` \| `integer` \| Number of tokens generated in the completion. \| -\| `total_tokens` \| `integer` \| Total tokens. \| +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Meta-Llama-3.1-405B-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` -The `logprobs` object is a dictionary with the following fields: +#### Explore more parameters supported by the inference client -\| Key \| Type \| Value \| -\|\|-\|\| -\| `text_offsets` \| `array` of `integers` \| The position or index of each token in the completion output. \| -\| `token_logprobs` \| `array` of `float` \| Selected `logprobs` from dictionary in `top_logprobs` array. \| -\| `tokens` \| `array` of `string` \| Selected tokens. \| -\| `top_logprobs` \| `array` of `dictionary` \| Array of dictionary. In each dictionary, the key is the token and the value is the prob. \| +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). -#### Example +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` -The following is an example response: ```json { - "id": "12345678-1234-1234-1234-abcdefghijkl", + "id": "0a1234b5de6789f01gh2i345j6789klm", "object": "chat.completion", - "created": 2012359, - "model": "", + "created": 1718726686, + "model": "Meta-Llama-3.1-405B-Instruct", "choices": [ { "index": 0, - "finish_reason": "stop", "message": { "role": "assistant", - "content": "Sure, I\'d be happy to help! The translation of ""I love programming"" from English to Italian is:\n\n""Amo la programmazione.""\n\nHere\'s a breakdown of the translation:\n\n* ""I love"" in English becomes ""Amo"" in Italian.\n* ""programming"" in English becomes ""la programmazione"" in Italian.\n\nI hope that helps! Let me know if you have any other sentences you\'d like me to translate." - } + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null } ], "usage": { - "prompt_tokens": 10, - "total_tokens": 40, - "completion_tokens": 30 + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 } } ``` - -## Deploy Meta Llama models to managed compute -Apart from deploying with the pay-as-you-go managed service, you can also deploy Meta Llama 3.1 models to managed compute in AI Studio. When deployed to managed compute, you can select all the details about the infrastructure running the model, including the virtual machines to use and the number of instances to handle the load you're expecting. Models deployed to managed compute consume quota from your subscription. The following models from the 3.1 release wave are available on managed compute: -- `Meta-Llama-3.1-8B-Instruct` (FT supported)-- `Meta-Llama-3.1-70B-Instruct` (FT supported)-- `Meta-Llama-3.1-8B` (FT supported)-- `Meta-Llama-3.1-70B` (FT supported)-- `Llama Guard 3 8B`-- `Prompt Guard` +> [!WARNING] +> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). -Follow these steps to deploy a model such as `Meta-Llama-3.1-70B-Instruct ` to a managed compute in [Azure AI Studio](https://ai.azure.com). +### Pass extra parameters to the model -1. Choose the model you want to deploy from the Azure AI Studio [model catalog](https://ai.azure.com/explore/models). +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. - Alternatively, you can initiate deployment by starting from your project in AI Studio. Select your project and then select Deployments > + Create. +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. -1. On the model's Details page, select Deploy next to the View license button. +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` - :::image type="content" source="../media/deploy-monitor/llama/deploy-real-time-endpoint.png" alt-text="A screenshot showing how to deploy a model with the managed compute option." lightbox="../media/deploy-monitor/llama/deploy-real-time-endpoint.png"::: -1. On the Deploy with Azure AI Content Safety (preview) page, select Skip Azure AI Content Safety so that you can continue to deploy the model using the UI. +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` - > [!TIP] - > In general, we recommend that you select Enable Azure AI Content Safety (Recommended) for deployment of the Llama model. This deployment option is currently only supported using the Python SDK and it happens in a notebook. +The following extra parameters can be passed to Meta Llama chat models: -1. Select Proceed. -1. Select the project where you want to create a deployment. +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `n` \| How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `best_of` \| Generates best_of completions server-side and returns the best (the one with the lowest log probability per token). Results can't be streamed. When used with `n`, best_of controls the number of candidate completions and n specifies how many to returnΓÇöbest_of must be greater than `n`. Note: Because this parameter generates many completions, it can quickly consume your token quota. \| `integer` \| +\| `logprobs` \| A number indicating to include the log probabilities on the logprobs most likely tokens and the chosen tokens. For example, if logprobs is 10, the API returns a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there might be up to logprobs+1 elements in the response. \| `integer` \| +\| `ignore_eos` \| Whether to ignore the `EOS` token and continue generating tokens after the `EOS` token is generated. \| `boolean` \| +\| `use_beam_search` \| Whether to use beam search instead of sampling. In such case, `best_of` must be greater than 1 and temperature must be 0. \| `boolean` \| +\| `stop_token_ids` \| List of IDs for tokens that, when generated, stop further token generation. The returned output contains the stop tokens unless the stop tokens are special tokens. \| `array` \| +\| `skip_special_tokens` \| Whether to skip special tokens in the output. \| `boolean` \| - > [!TIP] - > If you don't have enough quota available in the selected project, you can use the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. - -1. Select the Virtual machine and the Instance count that you want to assign to the deployment. -1. Select if you want to create this deployment as part of a new endpoint or an existing one. Endpoints can host multiple deployments while keeping resource configuration exclusive for each of them. Deployments under the same endpoint share the endpoint URI and its access keys. - -1. Indicate if you want to enable Inferencing data collection (preview). +### Apply content safety -1. Select Deploy. After a few moments, the endpoint's Details page opens up. +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. -1. Wait for the endpoint creation and deployment to finish. This step can take a few minutes. +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. -1. Select the Consume tab of the deployment to obtain code samples that can be used to consume the deployed model in your application. -### Consume Llama 2 models deployed to managed compute +```json +{ + "messages": [ + { + "role": "system", + "content": "You are an AI assistant that helps people find information." + }, + { + "role": "user", + "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + } + ] +} +``` ++ +```json +{ + "error": { + "message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.", + "type": null, + "param": "prompt", + "code": "content_filter", + "status": 400 + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). -For reference about how to invoke Llama models deployed to managed compute, see the model's card in the Azure AI Studio [model catalog](../how-to/model-catalog-overview.md). Each model's card has an overview page that includes a description of the model, samples for code-based inferencing, fine-tuning, and model evaluation. +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. -##### More inference examples -\| Package \| Sample Notebook \| -\|-\|-\| -\| CLI using CURL and Python web requests \| [webrequests.ipynb](https://aka.ms/meta-llama-3.1-405B-instruct-webrequests)\| -\| OpenAI SDK (experimental) \| [openaisdk.ipynb](https://aka.ms/meta-llama-3.1-405B-instruct-openai)\| -\| LangChain \| [langchain.ipynb](https://aka.ms/meta-llama-3.1-405B-instruct-langchain)\| -\| LiteLLM SDK \| [litellm.ipynb](https://aka.ms/meta-llama-3.1-405B-instruct-litellm) \| +## More inference examples -## Cost and quotas +For more examples of how to use Meta Llama, see the following examples and tutorials: -### Cost and quota considerations for Meta Llama 3.1 models deployed as a service +\| Description \| Language \| Sample \| +\|-\|-\|- \| +\| CURL request \| Bash \| [Link](https://aka.ms/meta-llama-3.1-405B-instruct-webrequests) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| Python web requests \| Python \| [Link](https://aka.ms/meta-llama-3.1-405B-instruct-webrequests) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/meta-llama-3.1-405B-instruct-openai) \| +\| LangChain \| Python \| [Link](https://aka.ms/meta-llama-3.1-405B-instruct-langchain) \| +\| LiteLLM \| Python \| [Link](https://aka.ms/meta-llama-3.1-405B-instruct-litellm) \| -Meta Llama 3.1 models deployed as a service are offered by Meta through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying or [fine-tuning the models](./fine-tune-model-llama.md). +## Cost and quota considerations for Meta Llama family of models deployed as serverless API endpoints -Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently. +Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. -For more information on how to track costs, see [monitor costs for models offered throughout the Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). +Meta Llama models deployed as a serverless API are offered by Meta through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. +Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. -Quota is managed per deployment. Each deployment has a rate limit of 400,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. +For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). -### Cost and quota considerations for Meta Llama 3.1 models deployed as managed compute +## Cost and quota considerations for Meta Llama family of models deployed to managed compute -For deployment and inferencing of Meta Llama 3.1 models with managed compute, you consume virtual machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once you reach this limit, you can request a quota increase. +Meta Llama models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration. -## Content filtering +It is a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal. -Models deployed as a serverless API with pay-as-you-go are protected by Azure AI Content Safety. When deployed to managed compute, you can opt out of this capability. With Azure AI content safety enabled, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [Azure AI Content Safety](../concepts/content-filtering.md). +## Related content -## Next steps -- [What is Azure AI Studio?](../what-is-ai-studio.md)-- [Fine-tune a Meta Llama 3.1 models in Azure AI Studio](fine-tune-model-llama.md)-- [Azure AI FAQ article](../faq.yml)-- [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Mistral Nemo	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-mistral-nemo.md	+ + Title: How to use Mistral Nemo chat model with Azure AI Studio + +description: Learn how to use Mistral Nemo chat model with Azure AI Studio. +++ Last updated : 08/08/2024+ +reviewer: fkriti +++ +zone_pivot_groups: azure-ai-model-catalog-samples-chat ++ +# How to use Mistral Nemo chat model + +In this article, you learn about Mistral Nemo chat model and how to use them. +Mistral AI offers two categories of models. Premium models including [Mistral Large and Mistral Small](deploy-models-mistral.md), available as serverless APIs with pay-as-you-go token-based billing. Open models including [Mistral Nemo](deploy-models-mistral-nemo.md), [Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01](deploy-models-mistral-open.md); available to also download and run on self-hosted managed endpoints. +++ +## Mistral Nemo chat model + +Mistral Nemo is a cutting-edge Language Model (LLM) boasting state-of-the-art reasoning, world knowledge, and coding capabilities within its size category. + +Mistral Nemo is a 12B model, making it a powerful drop-in replacement for any system using Mistral 7B, which it supersedes. It supports a context length of 128K, and it accepts only text inputs and generates text outputs. + +Additionally, Mistral Nemo is: + +* Jointly developed with Nvidia. This collaboration has resulted in a powerful 12B model that pushes the boundaries of language understanding and generation. +* Multilingual proficient. Mistral Nemo is equipped with a tokenizer called Tekken, which is designed for multilingual applications. It supports over 100 languages, such as English, French, German, and Spanish. Tekken is more efficient than the Llama 3 tokenizer in compressing text for approximately 85% of all languages, with significant improvements in Malayalam, Hindi, Arabic, and prevalent European languages. +* Agent-centric. Mistral Nemo possesses top-tier agentic capabilities, including native function calling and JSON outputting. +* Advanced in reasoning. Mistral Nemo demonstrates state-of-the-art mathematical and reasoning capabilities within its size category. ++ +You can learn more about the models in their respective model card: + +* [Mistral-Nemo](https://aka.ms/azureai/landing/Mistral-Nemo) ++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral Nemo chat model with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral Nemo chat model can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral Nemo chat model. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```python +model_info = client.get_model_info() +``` + +The response is as follows: ++ +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: Mistral-Nemo +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```python +from azure.ai.inference.models import SystemMessage, UserMessage + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Mistral-Nemo +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral Nemo chat model can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant that always generate responses in JSON format, using." + " the following format: { ""answer"": ""response"" }."), + UserMessage(content="How many languages are in the world?"), + ], + response_format={ "type": ChatCompletionsResponseFormat.JSON_OBJECT } +) +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +The following extra parameters can be passed to Mistral Nemo chat model: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| ++ +### Safe mode + +Mistral Nemo chat model support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "safe_mode": True + } +) +``` + +### Use tools + +Mistral Nemo chat model support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```python +from azure.ai.inference.models import FunctionDefinition, ChatCompletionsFunctionToolDefinition + +flight_info = ChatCompletionsFunctionToolDefinition( + function=FunctionDefinition( + name="get_flight_info", + description="Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + parameters={ + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates", + }, + "destination_city": { + "type": "string", + "description": "The flight destination city", + }, + }, + "required": ["origin_city", "destination_city"], + }, + ) +) + +tools = [flight_info] +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. ++ +```python +def get_flight_info(loc_origin: str, loc_destination: str): + return { + "info": f"There are no flights available from {loc_origin} to {loc_destination}. You should take a train, specially if it helps to reduce CO2 emissions." + } +``` + +Prompt the model to book flights with the help of this function: ++ +```python +messages = [ + SystemMessage( + content="You are a helpful assistant that help users to find information about traveling, how to get" + " to places and the different transportations options. You care about the environment and you" + " always have that in mind when answering inqueries.", + ), + UserMessage( + content="When is the next flight from Miami to Seattle?", + ), +] + +response = client.complete( + messages=messages, tools=tools, tool_choice="auto" +) +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```python +response_message = response.choices[0].message +tool_calls = response_message.tool_calls + +print("Finish reason:", response.choices[0].finish_reason) +print("Tool call:", tool_calls) +``` + +To continue, append this message to the chat history: ++ +```python +messages.append( + response_message +) +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```python +import json +from azure.ai.inference.models import ToolMessage + +for tool_call in tool_calls: + + # Get the tool details: + + function_name = tool_call.function.name + function_args = json.loads(tool_call.function.arguments.replace("\'", "\"")) + tool_call_id = tool_call.id + + print(f"Calling function `{function_name}` with arguments {function_args}") + + # Call the function defined above using `locals()`, which returns the list of all functions + # available in the scope as a dictionary. Notice that this is just done as a simple way to get + # the function callable from its string name. Then we can call it with the corresponding + # arguments. + + callable_func = locals()[function_name] + function_response = callable_func(*function_args) + + print("->", function_response) + + # Once we have a response from the function and its arguments, we can append a new message to the chat + # history. Notice how we are telling to the model that this chat message came from a tool: + + messages.append( + ToolMessage( + tool_call_id=tool_call_id, + content=json.dumps(function_response) + ) + ) +``` + +View the response from the model: ++ +```python +response = client.complete( + messages=messages, + tools=tools, +) +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```python +from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage + +try: + response = client.complete( + messages=[ + SystemMessage(content="You are an AI assistant that helps people find information."), + UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), + ] + ) + + print(response.choices[0].message.content) + +except HttpResponseError as ex: + if ex.status_code == 400: + response = ex.response.json() + if isinstance(response, dict) and "error" in response: + print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}") + else: + raise + raise +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Mistral Nemo chat model + +Mistral Nemo is a cutting-edge Language Model (LLM) boasting state-of-the-art reasoning, world knowledge, and coding capabilities within its size category. + +Mistral Nemo is a 12B model, making it a powerful drop-in replacement for any system using Mistral 7B, which it supersedes. It supports a context length of 128K, and it accepts only text inputs and generates text outputs. + +Additionally, Mistral Nemo is: + + Jointly developed with Nvidia. This collaboration has resulted in a powerful 12B model that pushes the boundaries of language understanding and generation. +* Multilingual proficient. Mistral Nemo is equipped with a tokenizer called Tekken, which is designed for multilingual applications. It supports over 100 languages, such as English, French, German, and Spanish. Tekken is more efficient than the Llama 3 tokenizer in compressing text for approximately 85% of all languages, with significant improvements in Malayalam, Hindi, Arabic, and prevalent European languages. +* Agent-centric. Mistral Nemo possesses top-tier agentic capabilities, including native function calling and JSON outputting. +* Advanced in reasoning. Mistral Nemo demonstrates state-of-the-art mathematical and reasoning capabilities within its size category. ++ +You can learn more about the models in their respective model card: + +* [Mistral-Nemo](https://aka.ms/azureai/landing/Mistral-Nemo) ++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral Nemo chat model with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral Nemo chat model can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral Nemo chat model. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: Mistral-Nemo +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Mistral-Nemo +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral Nemo chat model can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant that always generate responses in JSON format, using." + + " the following format: { \"answer\": \"response\" }." }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + response_format: { type: "json_object" } + } +}); +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +The following extra parameters can be passed to Mistral Nemo chat model: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| ++ +### Safe mode + +Mistral Nemo chat model support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + safe_mode: true + } +}); +``` + +### Use tools + +Mistral Nemo chat model support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```javascript +const flight_info = { + name: "get_flight_info", + description: "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + parameters: { + type: "object", + properties: { + origin_city: { + type: "string", + description: "The name of the city where the flight originates", + }, + destination_city: { + type: "string", + description: "The flight destination city", + }, + }, + required: ["origin_city", "destination_city"], + }, +} + +const tools = [ + { + type: "function", + function: flight_info, + }, +]; +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. ++ +```javascript +function get_flight_info(loc_origin, loc_destination) { + return { + info: "There are no flights available from " + loc_origin + " to " + loc_destination + ". You should take a train, specially if it helps to reduce CO2 emissions." + } +} +``` + +Prompt the model to book flights with the help of this function: ++ +```javascript +var result = await client.path("/chat/completions").post({ + body: { + messages: messages, + tools: tools, + tool_choice: "auto" + } +}); +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```javascript +const response_message = response.body.choices[0].message; +const tool_calls = response_message.tool_calls; + +console.log("Finish reason: " + response.body.choices[0].finish_reason); +console.log("Tool call: " + tool_calls); +``` + +To continue, append this message to the chat history: ++ +```javascript +messages.push(response_message); +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```javascript +function applyToolCall({ function: call, id }) { + // Get the tool details: + const tool_params = JSON.parse(call.arguments); + console.log("Calling function " + call.name + " with arguments " + tool_params); + + // Call the function defined above using `window`, which returns the list of all functions + // available in the scope as a dictionary. Notice that this is just done as a simple way to get + // the function callable from its string name. Then we can call it with the corresponding + // arguments. + const function_response = tool_params.map(window[call.name]); + console.log("-> " + function_response); + + return function_response +} + +for (const tool_call of tool_calls) { + var tool_response = tool_call.apply(applyToolCall); + + messages.push( + { + role: "tool", + tool_call_id: tool_call.id, + content: tool_response + } + ); +} +``` + +View the response from the model: ++ +```javascript +var result = await client.path("/chat/completions").post({ + body: { + messages: messages, + tools: tools, + } +}); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```javascript +try { + var messages = [ + { role: "system", content: "You are an AI assistant that helps people find information." }, + { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." }, + ]; + + var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } + }); + + console.log(response.body.choices[0].message.content); +} +catch (error) { + if (error.status_code == 400) { + var response = JSON.parse(error.response._content); + if (response.error) { + console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`); + } + else + { + throw error; + } + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Mistral Nemo chat model + +Mistral Nemo is a cutting-edge Language Model (LLM) boasting state-of-the-art reasoning, world knowledge, and coding capabilities within its size category. + +Mistral Nemo is a 12B model, making it a powerful drop-in replacement for any system using Mistral 7B, which it supersedes. It supports a context length of 128K, and it accepts only text inputs and generates text outputs. + +Additionally, Mistral Nemo is: + +* Jointly developed with Nvidia. This collaboration has resulted in a powerful 12B model that pushes the boundaries of language understanding and generation. +* Multilingual proficient. Mistral Nemo is equipped with a tokenizer called Tekken, which is designed for multilingual applications. It supports over 100 languages, such as English, French, German, and Spanish. Tekken is more efficient than the Llama 3 tokenizer in compressing text for approximately 85% of all languages, with significant improvements in Malayalam, Hindi, Arabic, and prevalent European languages. +* Agent-centric. Mistral Nemo possesses top-tier agentic capabilities, including native function calling and JSON outputting. +* Advanced in reasoning. Mistral Nemo demonstrates state-of-the-art mathematical and reasoning capabilities within its size category. ++ +You can learn more about the models in their respective model card: + +* [Mistral-Nemo](https://aka.ms/azureai/landing/Mistral-Nemo) ++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral Nemo chat model with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral Nemo chat model can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral Nemo chat model. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: Mistral-Nemo +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Mistral-Nemo +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` + +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. + +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` + +You can visualize how streaming generates content: ++ +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral Nemo chat model can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage( + "You are a helpful assistant that always generate responses in JSON format, " + + "using. the following format: { \"answer\": \"response\" }." + ), + new ChatRequestUserMessage( + "How many languages are in the world?" + ) + }, + ResponseFormat = new ChatCompletionsResponseFormatJSON() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +The following extra parameters can be passed to Mistral Nemo chat model: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| ++ +### Safe mode + +Mistral Nemo chat model support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "safe_mode", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +### Use tools + +Mistral Nemo chat model support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```csharp +FunctionDefinition flightInfoFunction = new FunctionDefinition("getFlightInfo") +{ + Description = "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + Parameters = BinaryData.FromObjectAsJson(new + { + Type = "object", + Properties = new + { + origin_city = new + { + Type = "string", + Description = "The name of the city where the flight originates" + }, + destination_city = new + { + Type = "string", + Description = "The flight destination city" + } + } + }, + new JsonSerializerOptions() { PropertyNamingPolicy = JsonNamingPolicy.CamelCase } + ) +}; + +ChatCompletionsFunctionToolDefinition getFlightTool = new ChatCompletionsFunctionToolDefinition(flightInfoFunction); +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. ++ +```csharp +static string getFlightInfo(string loc_origin, string loc_destination) +{ + return JsonSerializer.Serialize(new + { + info = $"There are no flights available from {loc_origin} to {loc_destination}. You " + + "should take a train, specially if it helps to reduce CO2 emissions." + }); +} +``` + +Prompt the model to book flights with the help of this function: ++ +```csharp +var chatHistory = new List<ChatRequestMessage>(){ + new ChatRequestSystemMessage( + "You are a helpful assistant that help users to find information about traveling, " + + "how to get to places and the different transportations options. You care about the" + + "environment and you always have that in mind when answering inqueries." + ), + new ChatRequestUserMessage("When is the next flight from Miami to Seattle?") + }; + +requestOptions = new ChatCompletionsOptions(chatHistory); +requestOptions.Tools.Add(getFlightTool); +requestOptions.ToolChoice = ChatCompletionsToolChoice.Auto; + +response = client.Complete(requestOptions); +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```csharp +var responseMenssage = response.Value.Choices[0].Message; +var toolsCall = responseMenssage.ToolCalls; + +Console.WriteLine($"Finish reason: {response.Value.Choices[0].FinishReason}"); +Console.WriteLine($"Tool call: {toolsCall[0].Id}"); +``` + +To continue, append this message to the chat history: ++ +```csharp +requestOptions.Messages.Add(new ChatRequestAssistantMessage(response.Value.Choices[0].Message)); +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```csharp +foreach (ChatCompletionsToolCall tool in toolsCall) +{ + if (tool is ChatCompletionsFunctionToolCall functionTool) + { + // Get the tool details: + string callId = functionTool.Id; + string toolName = functionTool.Name; + string toolArgumentsString = functionTool.Arguments; + Dictionary<string, object> toolArguments = JsonSerializer.Deserialize<Dictionary<string, object>>(toolArgumentsString); + + // Here you have to call the function defined. In this particular example we use + // reflection to find the method we definied before in an static class called + // `ChatCompletionsExamples`. Using reflection allows us to call a function + // by string name. Notice that this is just done for demonstration purposes as a + // simple way to get the function callable from its string name. Then we can call + // it with the corresponding arguments. + + var flags = BindingFlags.Instance \| BindingFlags.Public \| BindingFlags.NonPublic \| BindingFlags.Static; + string toolResponse = (string)typeof(ChatCompletionsExamples).GetMethod(toolName, flags).Invoke(null, toolArguments.Values.Cast<object>().ToArray()); + + Console.WriteLine("->", toolResponse); + requestOptions.Messages.Add(new ChatRequestToolMessage(toolResponse, callId)); + } + else + throw new Exception("Unsupported tool type"); +} +``` + +View the response from the model: ++ +```csharp +response = client.Complete(requestOptions); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else + { + throw; + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Mistral Nemo chat model + +Mistral Nemo is a cutting-edge Language Model (LLM) boasting state-of-the-art reasoning, world knowledge, and coding capabilities within its size category. + +Mistral Nemo is a 12B model, making it a powerful drop-in replacement for any system using Mistral 7B, which it supersedes. It supports a context length of 128K, and it accepts only text inputs and generates text outputs. + +Additionally, Mistral Nemo is: + +* Jointly developed with Nvidia. This collaboration has resulted in a powerful 12B model that pushes the boundaries of language understanding and generation. +* Multilingual proficient. Mistral Nemo is equipped with a tokenizer called Tekken, which is designed for multilingual applications. It supports over 100 languages, such as English, French, German, and Spanish. Tekken is more efficient than the Llama 3 tokenizer in compressing text for approximately 85% of all languages, with significant improvements in Malayalam, Hindi, Arabic, and prevalent European languages. +* Agent-centric. Mistral Nemo possesses top-tier agentic capabilities, including native function calling and JSON outputting. +* Advanced in reasoning. Mistral Nemo demonstrates state-of-the-art mathematical and reasoning capabilities within its size category. ++ +You can learn more about the models in their respective model card: + +* [Mistral-Nemo](https://aka.ms/azureai/landing/Mistral-Nemo) ++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral Nemo chat model with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral Nemo chat model can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral Nemo chat model. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` + +The response is as follows: ++ +```json +{ + "model_name": "Mistral-Nemo", + "model_type": "chat-completions", + "model_provider_name": "MistralAI" +} +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Mistral-Nemo", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` + +You can visualize how streaming generates content: ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Mistral-Nemo", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` + +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Mistral-Nemo", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Mistral-Nemo", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral Nemo chat model can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that always generate responses in JSON format, using the following format: { \"answer\": \"response\" }" + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "response_format": { "type": "json_object" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718727522, + "model": "Mistral-Nemo", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "{\"answer\": \"There are approximately 7,117 living languages in the world today, according to the latest estimates. However, this number can vary as some languages become extinct and others are newly discovered or classified.\"}", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 39, + "total_tokens": 87, + "completion_tokens": 48 + } +} +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` + +The following extra parameters can be passed to Mistral Nemo chat model: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| ++ +### Safe mode + +Mistral Nemo chat model support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "safemode": true +} +``` + +### Use tools + +Mistral Nemo chat model support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```json +{ + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters": { + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": [ + "origin_city", + "destination_city" + ] + } + } +} +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. + +Prompt the model to book flights with the help of this function: ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that help users to find information about traveling, how to get to places and the different transportations options. You care about the environment and you always have that in mind when answering inqueries" + }, + { + "role": "user", + "content": "When is the next flight from Miami to Seattle?" + } + ], + "tool_choice": "auto", + "tools": [ + { + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters": { + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": [ + "origin_city", + "destination_city" + ] + } + } + } + ] +} +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726007, + "model": "Mistral-Nemo", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "", + "tool_calls": [ + { + "id": "abc0dF1gh", + "type": "function", + "function": { + "name": "get_flight_info", + "arguments": "{\"origin_city\": \"Miami\", \"destination_city\": \"Seattle\"}", + "call_id": null + } + } + ] + }, + "finish_reason": "tool_calls", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 190, + "total_tokens": 226, + "completion_tokens": 36 + } +} +``` + +To continue, append this message to the chat history: + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. + +View the response from the model: ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that help users to find information about traveling, how to get to places and the different transportations options. You care about the environment and you always have that in mind when answering inqueries" + }, + { + "role": "user", + "content": "When is the next flight from Miami to Seattle?" + }, + { + "role": "assistant", + "content": "", + "tool_calls": [ + { + "id": "abc0DeFgH", + "type": "function", + "function": { + "name": "get_flight_info", + "arguments": "{\"origin_city\": \"Miami\", \"destination_city\": \"Seattle\"}", + "call_id": null + } + } + ] + }, + { + "role": "tool", + "content": "{ \"info\": \"There are no flights available from Miami to Seattle. You should take a train, specially if it helps to reduce CO2 emissions.\" }", + "tool_call_id": "abc0DeFgH" + } + ], + "tool_choice": "auto", + "tools": [ + { + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters":{ + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": ["origin_city", "destination_city"] + } + } + } + ] +} +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are an AI assistant that helps people find information." + }, + { + "role": "user", + "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + } + ] +} +``` ++ +```json +{ + "error": { + "message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.", + "type": null, + "param": "prompt", + "code": "content_filter", + "status": 400 + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++ +## More inference examples + +For more examples of how to use Mistral, see the following examples and tutorials: + +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| CURL request \| Bash \| [Link](https://aka.ms/mistral-large/webrequests-sample) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| Python web requests \| Python \| [Link](https://aka.ms/mistral-large/webrequests-sample) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/mistral-large/openaisdk) \| +\| LangChain \| Python \| [Link](https://aka.ms/mistral-large/langchain-sample) \| +\| Mistral AI \| Python \| [Link](https://aka.ms/mistral-large/mistralai-sample) \| +\| LiteLLM \| Python \| [Link](https://aka.ms/mistral-large/litellm-sample) \| ++ +## Cost and quota considerations for Mistral family of models deployed as serverless API endpoints + +Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. + +Mistral models deployed as a serverless API are offered by MistralAI through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. + +Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. + +For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). + +## Related content ++ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Mistral Open	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-mistral-open.md	+ + Title: How to use Mistral-7B and Mixtral chat models with Azure AI Studio + +description: Learn how to use Mistral-7B and Mixtral chat models with Azure AI Studio. +++ Last updated : 08/08/2024+ +reviewer: fkriti +++ +zone_pivot_groups: azure-ai-model-catalog-samples-chat ++ +# How to use Mistral-7B and Mixtral chat models + +In this article, you learn about Mistral-7B and Mixtral chat models and how to use them. +Mistral AI offers two categories of models. Premium models including [Mistral Large and Mistral Small](deploy-models-mistral.md), available as serverless APIs with pay-as-you-go token-based billing. Open models including [Mistral Nemo](deploy-models-mistral-nemo.md), [Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01](deploy-models-mistral-open.md); available to also download and run on self-hosted managed endpoints. +++ +## Mistral-7B and Mixtral chat models + +The Mistral-7B and Mixtral chat models include the following models: + +# [Mistral-7B-Instruct](#tab/mistral-7b-instruct) + +The Mistral-7B-Instruct Large Language Model (LLM) is an instruct, fine-tuned version of the Mistral-7B, a transformer model with the following architecture choices: + +* Grouped-Query Attention +* Sliding-Window Attention +* Byte-fallback BPE tokenizer ++ +The following models are available: + +* [mistralai-Mistral-7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v01) +* [mistralai-Mistral-7B-Instruct-v02](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v02) ++ +# [Mixtral-8x7B-Instruct](#tab/mistral-8x7B-instruct) + +The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks with 6x faster inference. + +Mixtral-8x7B-v0.1 is a decoder-only model with eight distinct groups or the "experts". At every layer, for every token, a router network chooses two of these experts to process the token and combine their output additively. Mixtral has 46.7B total parameters but only uses 12.9B parameters per token with this technique; therefore, the model can perform with the same speed and cost as a 12.9B model. ++ +The following models are available: + +* [mistralai-Mixtral-8x7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mixtral-8x7B-Instruct-v01) ++ +# [Mixtral-8x22B-Instruct](#tab/mistral-8x22b-instruct) + +The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct, fine-tuned version of the Mixtral-8x22B-v0.1. The model is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. + +Mixtral 8x22B comes with the following strengths: + +* Fluent in English, French, Italian, German, and Spanish +* Strong mathematics and coding capabilities +* Natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernization at scale +* Pprecise information recall from large documents, due to its 64K tokens context window ++ +The following models are available: + +* [mistralai-Mixtral-8x22B-Instruct-v0-1](https://aka.ms/azureai/landing/mistralai-Mixtral-8x22B-Instruct-v0-1) ++++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral-7B and Mixtral chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Mistral-7B and Mixtral chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral-7B and Mixtral chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.identity import DefaultAzureCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=DefaultAzureCredential(), +) +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```python +model_info = client.get_model_info() +``` + +The response is as follows: ++ +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: mistralai-Mistral-7B-Instruct-v01 +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```python +from azure.ai.inference.models import SystemMessage, UserMessage + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` + +> [!NOTE] +> mistralai-Mistral-7B-Instruct-v01, mistralai-Mistral-7B-Instruct-v02 and mistralai-Mixtral-8x22B-Instruct-v0-1 don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: mistralai-Mistral-7B-Instruct-v01 +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +> [!WARNING] +> Mistral doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +The following extra parameters can be passed to Mistral-7B and Mixtral chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| +++++ +## Mistral-7B and Mixtral chat models + +The Mistral-7B and Mixtral chat models include the following models: + +# [Mistral-7B-Instruct](#tab/mistral-7b-instruct) + +The Mistral-7B-Instruct Large Language Model (LLM) is an instruct, fine-tuned version of the Mistral-7B, a transformer model with the following architecture choices: + +* Grouped-Query Attention +* Sliding-Window Attention +* Byte-fallback BPE tokenizer ++ +The following models are available: + +* [mistralai-Mistral-7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v01) +* [mistralai-Mistral-7B-Instruct-v02](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v02) ++ +# [Mixtral-8x7B-Instruct](#tab/mistral-8x7B-instruct) + +The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks with 6x faster inference. + +Mixtral-8x7B-v0.1 is a decoder-only model with eight distinct groups or the "experts". At every layer, for every token, a router network chooses two of these experts to process the token and combine their output additively. Mixtral has 46.7B total parameters but only uses 12.9B parameters per token with this technique; therefore, the model can perform with the same speed and cost as a 12.9B model. ++ +The following models are available: + +* [mistralai-Mixtral-8x7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mixtral-8x7B-Instruct-v01) ++ +# [Mixtral-8x22B-Instruct](#tab/mistral-8x22b-instruct) + +The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct, fine-tuned version of the Mixtral-8x22B-v0.1. The model is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. + +Mixtral 8x22B comes with the following strengths: + +* Fluent in English, French, Italian, German, and Spanish +* Strong mathematics and coding capabilities +* Natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernization at scale +* Pprecise information recall from large documents, due to its 64K tokens context window ++ +The following models are available: + +* [mistralai-Mixtral-8x22B-Instruct-v0-1](https://aka.ms/azureai/landing/mistralai-Mixtral-8x22B-Instruct-v0-1) ++++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral-7B and Mixtral chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Mistral-7B and Mixtral chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral-7B and Mixtral chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { DefaultAzureCredential } from "@azure/identity"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new DefaultAzureCredential() +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: mistralai-Mistral-7B-Instruct-v01 +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +> [!NOTE] +> mistralai-Mistral-7B-Instruct-v01, mistralai-Mistral-7B-Instruct-v02 and mistralai-Mixtral-8x22B-Instruct-v0-1 don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: mistralai-Mistral-7B-Instruct-v01 +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +> [!WARNING] +> Mistral doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +The following extra parameters can be passed to Mistral-7B and Mixtral chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| +++++ +## Mistral-7B and Mixtral chat models + +The Mistral-7B and Mixtral chat models include the following models: + +# [Mistral-7B-Instruct](#tab/mistral-7b-instruct) + +The Mistral-7B-Instruct Large Language Model (LLM) is an instruct, fine-tuned version of the Mistral-7B, a transformer model with the following architecture choices: + +* Grouped-Query Attention +* Sliding-Window Attention +* Byte-fallback BPE tokenizer ++ +The following models are available: + +* [mistralai-Mistral-7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v01) +* [mistralai-Mistral-7B-Instruct-v02](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v02) ++ +# [Mixtral-8x7B-Instruct](#tab/mistral-8x7B-instruct) + +The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks with 6x faster inference. + +Mixtral-8x7B-v0.1 is a decoder-only model with eight distinct groups or the "experts". At every layer, for every token, a router network chooses two of these experts to process the token and combine their output additively. Mixtral has 46.7B total parameters but only uses 12.9B parameters per token with this technique; therefore, the model can perform with the same speed and cost as a 12.9B model. ++ +The following models are available: + +* [mistralai-Mixtral-8x7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mixtral-8x7B-Instruct-v01) ++ +# [Mixtral-8x22B-Instruct](#tab/mistral-8x22b-instruct) + +The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct, fine-tuned version of the Mixtral-8x22B-v0.1. The model is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. + +Mixtral 8x22B comes with the following strengths: + +* Fluent in English, French, Italian, German, and Spanish +* Strong mathematics and coding capabilities +* Natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernization at scale +* Pprecise information recall from large documents, due to its 64K tokens context window ++ +The following models are available: + +* [mistralai-Mixtral-8x22B-Instruct-v0-1](https://aka.ms/azureai/landing/mistralai-Mixtral-8x22B-Instruct-v0-1) ++++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral-7B and Mixtral chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Mistral-7B and Mixtral chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral-7B and Mixtral chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```csharp +client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new DefaultAzureCredential(includeInteractiveCredentials: true) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: mistralai-Mistral-7B-Instruct-v01 +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +> [!NOTE] +> mistralai-Mistral-7B-Instruct-v01, mistralai-Mistral-7B-Instruct-v02 and mistralai-Mixtral-8x22B-Instruct-v0-1 don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: mistralai-Mistral-7B-Instruct-v01 +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` + +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. + +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` + +You can visualize how streaming generates content: ++ +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +> [!WARNING] +> Mistral doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +The following extra parameters can be passed to Mistral-7B and Mixtral chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| +++++ +## Mistral-7B and Mixtral chat models + +The Mistral-7B and Mixtral chat models include the following models: + +# [Mistral-7B-Instruct](#tab/mistral-7b-instruct) + +The Mistral-7B-Instruct Large Language Model (LLM) is an instruct, fine-tuned version of the Mistral-7B, a transformer model with the following architecture choices: + +* Grouped-Query Attention +* Sliding-Window Attention +* Byte-fallback BPE tokenizer ++ +The following models are available: + +* [mistralai-Mistral-7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v01) +* [mistralai-Mistral-7B-Instruct-v02](https://aka.ms/azureai/landing/mistralai-Mistral-7B-Instruct-v02) ++ +# [Mixtral-8x7B-Instruct](#tab/mistral-8x7B-instruct) + +The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks with 6x faster inference. + +Mixtral-8x7B-v0.1 is a decoder-only model with eight distinct groups or the "experts". At every layer, for every token, a router network chooses two of these experts to process the token and combine their output additively. Mixtral has 46.7B total parameters but only uses 12.9B parameters per token with this technique; therefore, the model can perform with the same speed and cost as a 12.9B model. ++ +The following models are available: + +* [mistralai-Mixtral-8x7B-Instruct-v01](https://aka.ms/azureai/landing/mistralai-Mixtral-8x7B-Instruct-v01) ++ +# [Mixtral-8x22B-Instruct](#tab/mistral-8x22b-instruct) + +The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct, fine-tuned version of the Mixtral-8x22B-v0.1. The model is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. + +Mixtral 8x22B comes with the following strengths: + +* Fluent in English, French, Italian, German, and Spanish +* Strong mathematics and coding capabilities +* Natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernization at scale +* Pprecise information recall from large documents, due to its 64K tokens context window ++ +The following models are available: + +* [mistralai-Mixtral-8x22B-Instruct-v0-1](https://aka.ms/azureai/landing/mistralai-Mixtral-8x22B-Instruct-v0-1) ++++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral-7B and Mixtral chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Mistral-7B and Mixtral chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral-7B and Mixtral chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` + +The response is as follows: ++ +```json +{ + "model_name": "mistralai-Mistral-7B-Instruct-v01", + "model_type": "chat-completions", + "model_provider_name": "MistralAI" +} +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` + +> [!NOTE] +> mistralai-Mistral-7B-Instruct-v01, mistralai-Mistral-7B-Instruct-v02 and mistralai-Mixtral-8x22B-Instruct-v0-1 don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "mistralai-Mistral-7B-Instruct-v01", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` + +You can visualize how streaming generates content: ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "mistralai-Mistral-7B-Instruct-v01", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` + +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "mistralai-Mistral-7B-Instruct-v01", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "mistralai-Mistral-7B-Instruct-v01", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +> [!WARNING] +> Mistral doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` + +The following extra parameters can be passed to Mistral-7B and Mixtral chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| +++ +## More inference examples + +For more examples of how to use Mistral, see the following examples and tutorials: + +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| CURL request \| Bash \| [Link](https://aka.ms/mistral-large/webrequests-sample) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| Python web requests \| Python \| [Link](https://aka.ms/mistral-large/webrequests-sample) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/mistral-large/openaisdk) \| +\| LangChain \| Python \| [Link](https://aka.ms/mistral-large/langchain-sample) \| +\| Mistral AI \| Python \| [Link](https://aka.ms/mistral-large/mistralai-sample) \| +\| LiteLLM \| Python \| [Link](https://aka.ms/mistral-large/litellm-sample) \| ++ +## Cost and quota considerations for Mistral family of models deployed to managed compute + +Mistral models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration. + +It is a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal. + +## Related content ++ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Mistral	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-mistral.md	Title: How to deploy Mistral family of models with Azure AI Studio + Title: How to use Mistral premium chat models with Azure AI Studio -description: Learn how to deploy Mistral family of models with Azure AI Studio. - +description: Learn how to use Mistral premium chat models with Azure AI Studio. + Previously updated : 5/21/2024 Last updated : 08/08/2024 reviewer: fkriti -+ +zone_pivot_groups: azure-ai-model-catalog-samples-chat ++ +# How to use Mistral premium chat models + +In this article, you learn about Mistral premium chat models and how to use them. +Mistral AI offers two categories of models. Premium models including [Mistral Large and Mistral Small](deploy-models-mistral.md), available as serverless APIs with pay-as-you-go token-based billing. Open models including [Mistral Nemo](deploy-models-mistral-nemo.md), [Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01](deploy-models-mistral-open.md); available to also download and run on self-hosted managed endpoints. +++ +## Mistral premium chat models + +The Mistral premium chat models include the following models: + +# [Mistral Large](#tab/mistral-large) + +Mistral Large is Mistral AI's most advanced Large Language Model (LLM). It can be used on any language-based task, thanks to its state-of-the-art reasoning and knowledge capabilities. + +Additionally, Mistral Large is: + +* Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32-K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model and extra safety layer with the safe_mode option. + +And attributes of Mistral Large (2407) include: + +* Multi-lingual by design. Supports dozens of languages, including English, French, German, Spanish, and Italian. +* Proficient in coding. Trained on more than 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash. Also trained on more specific languages such as Swift and Fortran. +* Agent-centric. Possesses agentic capabilities with native function calling and JSON outputting. +* Advanced in reasoning. Demonstrates state-of-the-art mathematical and reasoning capabilities. ++ +The following models are available: + +* [Mistral-Large](https://aka.ms/azureai/landing/Mistral-Large) +* [Mistral-Large-2407](https://aka.ms/azureai/landing/Mistral-Large-2407) ++ +# [Mistral Small](#tab/mistral-small) + +Mistral Small is Mistral AI's most efficient Large Language Model (LLM). It can be used on any language-based task that requires high efficiency and low latency. + +Mistral Small is: + +* A small model optimized for low latency. Efficient for high volume and low latency workloads. Mistral Small is Mistral's smallest proprietary model, it outperforms Mixtral-8x7B and has lower latency. +* Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model, and extra safety layer with the safe_mode option. ++ +The following models are available: + +* [Mistral-Small](https://aka.ms/azureai/landing/Mistral-Small) ++++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral premium chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral premium chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral premium chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```python +model_info = client.get_model_info() +``` + +The response is as follows: ++ +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: Mistral-Large +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```python +from azure.ai.inference.models import SystemMessage, UserMessage + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Mistral-Large +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral premium chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant that always generate responses in JSON format, using." + " the following format: { ""answer"": ""response"" }."), + UserMessage(content="How many languages are in the world?"), + ], + response_format={ "type": ChatCompletionsResponseFormat.JSON_OBJECT } +) +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +The following extra parameters can be passed to Mistral premium chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| ++ +### Safe mode + +Mistral premium chat models support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "safe_mode": True + } +) +``` + +### Use tools + +Mistral premium chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```python +from azure.ai.inference.models import FunctionDefinition, ChatCompletionsFunctionToolDefinition + +flight_info = ChatCompletionsFunctionToolDefinition( + function=FunctionDefinition( + name="get_flight_info", + description="Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + parameters={ + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates", + }, + "destination_city": { + "type": "string", + "description": "The flight destination city", + }, + }, + "required": ["origin_city", "destination_city"], + }, + ) +) + +tools = [flight_info] +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. ++ +```python +def get_flight_info(loc_origin: str, loc_destination: str): + return { + "info": f"There are no flights available from {loc_origin} to {loc_destination}. You should take a train, specially if it helps to reduce CO2 emissions." + } +``` + +Prompt the model to book flights with the help of this function: ++ +```python +messages = [ + SystemMessage( + content="You are a helpful assistant that help users to find information about traveling, how to get" + " to places and the different transportations options. You care about the environment and you" + " always have that in mind when answering inqueries.", + ), + UserMessage( + content="When is the next flight from Miami to Seattle?", + ), +] + +response = client.complete( + messages=messages, tools=tools, tool_choice="auto" +) +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```python +response_message = response.choices[0].message +tool_calls = response_message.tool_calls + +print("Finish reason:", response.choices[0].finish_reason) +print("Tool call:", tool_calls) +``` + +To continue, append this message to the chat history: ++ +```python +messages.append( + response_message +) +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```python +import json +from azure.ai.inference.models import ToolMessage + +for tool_call in tool_calls: + + # Get the tool details: + + function_name = tool_call.function.name + function_args = json.loads(tool_call.function.arguments.replace("\'", "\"")) + tool_call_id = tool_call.id + + print(f"Calling function `{function_name}` with arguments {function_args}") + + # Call the function defined above using `locals()`, which returns the list of all functions + # available in the scope as a dictionary. Notice that this is just done as a simple way to get + # the function callable from its string name. Then we can call it with the corresponding + # arguments. + + callable_func = locals()[function_name] + function_response = callable_func(*function_args) + + print("->", function_response) + + # Once we have a response from the function and its arguments, we can append a new message to the chat + # history. Notice how we are telling to the model that this chat message came from a tool: + + messages.append( + ToolMessage( + tool_call_id=tool_call_id, + content=json.dumps(function_response) + ) + ) +``` + +View the response from the model: ++ +```python +response = client.complete( + messages=messages, + tools=tools, +) +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```python +from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage + +try: + response = client.complete( + messages=[ + SystemMessage(content="You are an AI assistant that helps people find information."), + UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), + ] + ) + + print(response.choices[0].message.content) + +except HttpResponseError as ex: + if ex.status_code == 400: + response = ex.response.json() + if isinstance(response, dict) and "error" in response: + print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}") + else: + raise + raise +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Mistral premium chat models + +The Mistral premium chat models include the following models: + +# [Mistral Large](#tab/mistral-large) + +Mistral Large is Mistral AI's most advanced Large Language Model (LLM). It can be used on any language-based task, thanks to its state-of-the-art reasoning and knowledge capabilities. + +Additionally, Mistral Large is: + + Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32-K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model and extra safety layer with the safe_mode option. + +And attributes of Mistral Large (2407) include: + +* Multi-lingual by design. Supports dozens of languages, including English, French, German, Spanish, and Italian. +* Proficient in coding. Trained on more than 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash. Also trained on more specific languages such as Swift and Fortran. +* Agent-centric. Possesses agentic capabilities with native function calling and JSON outputting. +* Advanced in reasoning. Demonstrates state-of-the-art mathematical and reasoning capabilities. ++ +The following models are available: + +* [Mistral-Large](https://aka.ms/azureai/landing/Mistral-Large) +* [Mistral-Large-2407](https://aka.ms/azureai/landing/Mistral-Large-2407) ++ +# [Mistral Small](#tab/mistral-small) + +Mistral Small is Mistral AI's most efficient Large Language Model (LLM). It can be used on any language-based task that requires high efficiency and low latency. + +Mistral Small is: + +* A small model optimized for low latency. Efficient for high volume and low latency workloads. Mistral Small is Mistral's smallest proprietary model, it outperforms Mixtral-8x7B and has lower latency. +* Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model, and extra safety layer with the safe_mode option. ++ +The following models are available: + +* [Mistral-Small](https://aka.ms/azureai/landing/Mistral-Small) ++ -# How to deploy Mistral models with Azure AI Studio +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral premium chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral premium chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral premium chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: Mistral-Large +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Mistral-Large +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral premium chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant that always generate responses in JSON format, using." + + " the following format: { \"answer\": \"response\" }." }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + response_format: { type: "json_object" } + } +}); +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +The following extra parameters can be passed to Mistral premium chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| ++ +### Safe mode + +Mistral premium chat models support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + safe_mode: true + } +}); +``` + +### Use tools + +Mistral premium chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```javascript +const flight_info = { + name: "get_flight_info", + description: "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + parameters: { + type: "object", + properties: { + origin_city: { + type: "string", + description: "The name of the city where the flight originates", + }, + destination_city: { + type: "string", + description: "The flight destination city", + }, + }, + required: ["origin_city", "destination_city"], + }, +} + +const tools = [ + { + type: "function", + function: flight_info, + }, +]; +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. ++ +```javascript +function get_flight_info(loc_origin, loc_destination) { + return { + info: "There are no flights available from " + loc_origin + " to " + loc_destination + ". You should take a train, specially if it helps to reduce CO2 emissions." + } +} +``` + +Prompt the model to book flights with the help of this function: ++ +```javascript +var result = await client.path("/chat/completions").post({ + body: { + messages: messages, + tools: tools, + tool_choice: "auto" + } +}); +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```javascript +const response_message = response.body.choices[0].message; +const tool_calls = response_message.tool_calls; + +console.log("Finish reason: " + response.body.choices[0].finish_reason); +console.log("Tool call: " + tool_calls); +``` + +To continue, append this message to the chat history: ++ +```javascript +messages.push(response_message); +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```javascript +function applyToolCall({ function: call, id }) { + // Get the tool details: + const tool_params = JSON.parse(call.arguments); + console.log("Calling function " + call.name + " with arguments " + tool_params); + + // Call the function defined above using `window`, which returns the list of all functions + // available in the scope as a dictionary. Notice that this is just done as a simple way to get + // the function callable from its string name. Then we can call it with the corresponding + // arguments. + const function_response = tool_params.map(window[call.name]); + console.log("-> " + function_response); + + return function_response +} + +for (const tool_call of tool_calls) { + var tool_response = tool_call.apply(applyToolCall); + + messages.push( + { + role: "tool", + tool_call_id: tool_call.id, + content: tool_response + } + ); +} +``` + +View the response from the model: ++ +```javascript +var result = await client.path("/chat/completions").post({ + body: { + messages: messages, + tools: tools, + } +}); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```javascript +try { + var messages = [ + { role: "system", content: "You are an AI assistant that helps people find information." }, + { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." }, + ]; + + var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } + }); + + console.log(response.body.choices[0].message.content); +} +catch (error) { + if (error.status_code == 400) { + var response = JSON.parse(error.response._content); + if (response.error) { + console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`); + } + else + { + throw error; + } + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). ++++ +## Mistral premium chat models + +The Mistral premium chat models include the following models: + +# [Mistral Large](#tab/mistral-large) + +Mistral Large is Mistral AI's most advanced Large Language Model (LLM). It can be used on any language-based task, thanks to its state-of-the-art reasoning and knowledge capabilities. + +Additionally, Mistral Large is: + +* Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32-K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model and extra safety layer with the safe_mode option. + +And attributes of Mistral Large (2407) include: + +* Multi-lingual by design. Supports dozens of languages, including English, French, German, Spanish, and Italian. +* Proficient in coding. Trained on more than 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash. Also trained on more specific languages such as Swift and Fortran. +* Agent-centric. Possesses agentic capabilities with native function calling and JSON outputting. +* Advanced in reasoning. Demonstrates state-of-the-art mathematical and reasoning capabilities. ++ +The following models are available: + +* [Mistral-Large](https://aka.ms/azureai/landing/Mistral-Large) +* [Mistral-Large-2407](https://aka.ms/azureai/landing/Mistral-Large-2407) ++ +# [Mistral Small](#tab/mistral-small) + +Mistral Small is Mistral AI's most efficient Large Language Model (LLM). It can be used on any language-based task that requires high efficiency and low latency. + +Mistral Small is: + +* A small model optimized for low latency. Efficient for high volume and low latency workloads. Mistral Small is Mistral's smallest proprietary model, it outperforms Mixtral-8x7B and has lower latency. +* Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model, and extra safety layer with the safe_mode option. ++ +The following models are available: + +* [Mistral-Small](https://aka.ms/azureai/landing/Mistral-Small) ++++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. + +## Prerequisites + +To use Mistral premium chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral premium chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral premium chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: Mistral-Large +Model type: chat-completions +Model provider name: MistralAI +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Mistral-Large +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` + +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. + +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` + +You can visualize how streaming generates content: ++ +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral premium chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage( + "You are a helpful assistant that always generate responses in JSON format, " + + "using. the following format: { \"answer\": \"response\" }." + ), + new ChatRequestUserMessage( + "How many languages are in the world?" + ) + }, + ResponseFormat = new ChatCompletionsResponseFormatJSON() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +The following extra parameters can be passed to Mistral premium chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| ++ +### Safe mode + +Mistral premium chat models support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "safe_mode", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +### Use tools + +Mistral premium chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. + +The following code example creates a tool definition that is able to look from flight information from two different cities. ++ +```csharp +FunctionDefinition flightInfoFunction = new FunctionDefinition("getFlightInfo") +{ + Description = "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + Parameters = BinaryData.FromObjectAsJson(new + { + Type = "object", + Properties = new + { + origin_city = new + { + Type = "string", + Description = "The name of the city where the flight originates" + }, + destination_city = new + { + Type = "string", + Description = "The flight destination city" + } + } + }, + new JsonSerializerOptions() { PropertyNamingPolicy = JsonNamingPolicy.CamelCase } + ) +}; + +ChatCompletionsFunctionToolDefinition getFlightTool = new ChatCompletionsFunctionToolDefinition(flightInfoFunction); +``` + +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. -In this article, you learn how to use Azure AI Studio to deploy the Mistral family of models as serverless APIs with pay-as-you-go token-based billing. -Mistral AI offers two categories of models in the [Azure AI Studio](https://ai.azure.com). These models are available in the [model catalog](model-catalog-overview.md): +```csharp +static string getFlightInfo(string loc_origin, string loc_destination) +{ + return JsonSerializer.Serialize(new + { + info = $"There are no flights available from {loc_origin} to {loc_destination}. You " + + "should take a train, specially if it helps to reduce CO2 emissions." + }); +} +``` + +Prompt the model to book flights with the help of this function: ++ +```csharp +var chatHistory = new List<ChatRequestMessage>(){ + new ChatRequestSystemMessage( + "You are a helpful assistant that help users to find information about traveling, " + + "how to get to places and the different transportations options. You care about the" + + "environment and you always have that in mind when answering inqueries." + ), + new ChatRequestUserMessage("When is the next flight from Miami to Seattle?") + }; + +requestOptions = new ChatCompletionsOptions(chatHistory); +requestOptions.Tools.Add(getFlightTool); +requestOptions.ToolChoice = ChatCompletionsToolChoice.Auto; + +response = client.Complete(requestOptions); +``` + +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. ++ +```csharp +var responseMenssage = response.Value.Choices[0].Message; +var toolsCall = responseMenssage.ToolCalls; + +Console.WriteLine($"Finish reason: {response.Value.Choices[0].FinishReason}"); +Console.WriteLine($"Tool call: {toolsCall[0].Id}"); +``` + +To continue, append this message to the chat history: ++ +```csharp +requestOptions.Messages.Add(new ChatRequestAssistantMessage(response.Value.Choices[0].Message)); +``` + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. ++ +```csharp +foreach (ChatCompletionsToolCall tool in toolsCall) +{ + if (tool is ChatCompletionsFunctionToolCall functionTool) + { + // Get the tool details: + string callId = functionTool.Id; + string toolName = functionTool.Name; + string toolArgumentsString = functionTool.Arguments; + Dictionary<string, object> toolArguments = JsonSerializer.Deserialize<Dictionary<string, object>>(toolArgumentsString); + + // Here you have to call the function defined. In this particular example we use + // reflection to find the method we definied before in an static class called + // `ChatCompletionsExamples`. Using reflection allows us to call a function + // by string name. Notice that this is just done for demonstration purposes as a + // simple way to get the function callable from its string name. Then we can call + // it with the corresponding arguments. + + var flags = BindingFlags.Instance \| BindingFlags.Public \| BindingFlags.NonPublic \| BindingFlags.Static; + string toolResponse = (string)typeof(ChatCompletionsExamples).GetMethod(toolName, flags).Invoke(null, toolArguments.Values.Cast<object>().ToArray()); + + Console.WriteLine("->", toolResponse); + requestOptions.Messages.Add(new ChatRequestToolMessage(toolResponse, callId)); + } + else + throw new Exception("Unsupported tool type"); +} +``` + +View the response from the model: ++ +```csharp +response = client.Complete(requestOptions); +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. -* __Premium models__: Mistral Large (2402), Mistral Large (2407), and Mistral Small. -* __Open models__: Mistral Nemo, Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01. -All the premium models and Mistral Nemo (an open model) can be deployed as serverless APIs with pay-as-you-go token-based billing. The other open models can be deployed to managed computes in your own Azure subscription. +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else + { + throw; + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + -You can browse the Mistral family of models in the model catalog by filtering on the Mistral collection. -## Mistral family of models + +## Mistral premium chat models + +The Mistral premium chat models include the following models: # [Mistral Large](#tab/mistral-large) -Mistral Large is Mistral AI's most advanced Large Language Model (LLM). It can be used on any language-based task, thanks to its state-of-the-art reasoning and knowledge capabilities. There are two variants available for the Mistral Large model version: +Mistral Large is Mistral AI's most advanced Large Language Model (LLM). It can be used on any language-based task, thanks to its state-of-the-art reasoning and knowledge capabilities. + +Additionally, Mistral Large is: + +* Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32-K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model and extra safety layer with the safe_mode option. -- Mistral Large (2402)-- Mistral Large (2407) +And attributes of Mistral Large (2407) include: -Additionally, some attributes of _Mistral Large (2402)_ include: +* Multi-lingual by design. Supports dozens of languages, including English, French, German, Spanish, and Italian. +* Proficient in coding. Trained on more than 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash. Also trained on more specific languages such as Swift and Fortran. +* Agent-centric. Possesses agentic capabilities with native function calling and JSON outputting. +* Advanced in reasoning. Demonstrates state-of-the-art mathematical and reasoning capabilities. -* __Specialized in RAG.__ Crucial information isn't lost in the middle of long context windows (up to 32-K tokens). -* __Strong in coding.__ Code generation, review, and comments. Supports all mainstream coding languages. -* __Multi-lingual by design.__ Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. -* __Responsible AI compliant.__ Efficient guardrails baked in the model and extra safety layer with the `safe_mode` option. -And attributes of _Mistral Large (2407)_ include: +The following models are available: -- Multi-lingual by design. Supports dozens of languages, including English, French, German, Spanish, and Italian.-- Proficient in coding. Trained on more than 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash. Also trained on more specific languages such as Swift and Fortran.-- Agent-centric. Possesses agentic capabilities with native function calling and JSON outputting.-- Advanced in reasoning. Demonstrates state-of-the-art mathematical and reasoning capabilities. +* [Mistral-Large](https://aka.ms/azureai/landing/Mistral-Large) +* [Mistral-Large-2407](https://aka.ms/azureai/landing/Mistral-Large-2407) # [Mistral Small](#tab/mistral-small) Mistral Small is Mistral AI's most efficient Large Language Model (LLM). It can Mistral Small is: -- A small model optimized for low latency. Efficient for high volume and low latency workloads. Mistral Small is Mistral's smallest proprietary model, it outperforms Mixtral-8x7B and has lower latency. -- Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32K tokens).-- Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages.-- Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported.-- Responsible AI compliant. Efficient guardrails baked in the model, and extra safety layer with the `safe_mode` option. +* A small model optimized for low latency. Efficient for high volume and low latency workloads. Mistral Small is Mistral's smallest proprietary model, it outperforms Mixtral-8x7B and has lower latency. +* Specialized in RAG. Crucial information isn't lost in the middle of long context windows (up to 32K tokens). +* Strong in coding. Code generation, review, and comments. Supports all mainstream coding languages. +* Multi-lingual by design. Best-in-class performance in French, German, Spanish, Italian, and English. Dozens of other languages are supported. +* Responsible AI compliant. Efficient guardrails baked in the model, and extra safety layer with the safe_mode option. -# [Mistral Nemo](#tab/mistral-nemo) +The following models are available: -Mistral Nemo is a cutting-edge Language Model (LLM) boasting state-of-the-art reasoning, world knowledge, and coding capabilities within its size category. +* [Mistral-Small](https://aka.ms/azureai/landing/Mistral-Small) -Mistral Nemo is a 12B model, making it a powerful drop-in replacement for any system using Mistral 7B, which it supersedes. It supports a context length of 128K, and it accepts only text inputs and generates text outputs. -Additionally, Mistral Nemo is: ++ +> [!TIP] +> Additionally, MistralAI supports the use of a tailored API for use with specific features of the model. To use the model-provider specific API, check [MistralAI documentation](https://docs.mistral.ai/) or see the [inference examples](#more-inference-examples) section to code examples. -- Jointly developed with Nvidia. This collaboration has resulted in a powerful 12B model that pushes the boundaries of language understanding and generation.-- Multilingual proficient. Mistral Nemo is equipped with a tokenizer called Tekken, which is designed for multilingual applications. It supports over 100 languages, such as English, French, German, and Spanish. Tekken is more efficient than the Llama 3 tokenizer in compressing text for approximately 85% of all languages, with significant improvements in Malayalam, Hindi, Arabic, and prevalent European languages.-- Agent-centric. Mistral Nemo possesses top-tier agentic capabilities, including native function calling and JSON outputting.-- Advanced in reasoning. Mistral Nemo demonstrates state-of-the-art mathematical and reasoning capabilities within its size category. +## Prerequisites - +To use Mistral premium chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Mistral premium chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). -## Deploy Mistral family of models as a serverless API +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) -Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription. +### A REST client -Mistral Large (2402), Mistral Large (2407), Mistral Small, and Mistral Nemo can be deployed as a serverless API with pay-as-you-go billing and are offered by Mistral AI through the Microsoft Azure Marketplace. Mistral AI can change or update the terms of use and pricing of these models. +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: -### Prerequisites +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. -- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for eligible models in the Mistral family is only available with hubs created in these regions: +## Work with chat completions - - East US - - East US 2 - - North Central US - - South Central US - - West US - - West US 3 - - Sweden Central +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Mistral premium chat models. -- An [Azure AI Studio project](../how-to/create-projects.md).-- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md). +### Create a client to consume the model -### Create a new deployment +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. -The following steps demonstrate the deployment of Mistral Large (2402), but you can use the same steps to deploy Mistral Nemo or any of the premium Mistral models by replacing the model name. +### Get the model's capabilities -To create a deployment: +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: -1. Sign in to [Azure AI Studio](https://ai.azure.com). -1. Select Model catalog from the left sidebar. -1. Search for and select the Mistral Large (2402) model to open its Details page. +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` - :::image type="content" source="../media/deploy-monitor/mistral/mistral-large-deploy-directly-from-catalog.png" alt-text="A screenshot showing how to access the model details page by going through the model catalog." lightbox="../media/deploy-monitor/mistral/mistral-large-deploy-directly-from-catalog.png"::: +The response is as follows: -1. Select Deploy to open a serverless API deployment window for the model. -1. Alternatively, you can initiate a deployment by starting from your project in AI Studio. - 1. From the left sidebar of your project, select Components > Deployments. - 1. Select + Create deployment. - 1. Search for and select the Mistral Large (2402) model to open the Model's Details page. +```json +{ + "model_name": "Mistral-Large", + "model_type": "chat-completions", + "model_provider_name": "MistralAI" +} +``` - :::image type="content" source="../media/deploy-monitor/mistral/mistral-large-deploy-starting-from-project.png" alt-text="A screenshot showing how to access the model details page by going through the Deployments page in your project." lightbox="../media/deploy-monitor/mistral/mistral-large-deploy-starting-from-project.png"::: +### Create a chat completion request - 1. Select Confirm to open a serverless API deployment window for the model. +The following example shows how you can create a basic chat completions request to the model. - :::image type="content" source="../media/deploy-monitor/mistral/mistral-large-deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model as a serverless API." lightbox="../media/deploy-monitor/mistral/mistral-large-deploy-pay-as-you-go.png"::: +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` -1. Select the project in which you want to deploy your model. To use the serverless API model deployment offering, your project must belong to one of the regions listed in the [prerequisites](#prerequisites). -1. In the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. -1. Select the Pricing and terms tab to learn about pricing for the selected model. -1. Select the Subscribe and Deploy button. If this is your first time deploying the model in the project, you have to subscribe your project for the particular offering. This step requires that your account has the Azure AI Developer role permissions on the resource group, as listed in the prerequisites. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. Currently, you can have only one deployment for each model within a project. -1. Once you subscribe the project for the particular Azure Marketplace offering, subsequent deployments of the _same_ offering in the _same_ project don't require subscribing again. If this scenario applies to you, there's a Continue to deploy option to select. +The response is as follows, where you can see the model's usage statistics: - :::image type="content" source="../media/deploy-monitor/mistral/mistral-large-existing-subscription.png" alt-text="A screenshot showing a project that is already subscribed to the offering." lightbox="../media/deploy-monitor/mistral/mistral-large-existing-subscription.png"::: -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. - :::image type="content" source="../media/deploy-monitor/mistral/mistral-large-deployment-name.png" alt-text="A screenshot showing how to indicate the name of the deployment you want to create." lightbox="../media/deploy-monitor/mistral/mistral-large-deployment-name.png"::: +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Mistral-Large", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. -1. Select Open in playground to start interacting with the model. -1. Return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key. For more information on using the APIs, see the [reference](#reference-for-mistral-family-of-models-deployed-as-a-service) section. -1. You can always find the endpoint's details, URL, and access keys by navigating to your Project overview page. Then, from the left sidebar of your project, select Components > Deployments. +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. -To learn about billing for the Mistral AI model deployed as a serverless API with pay-as-you-go token-based billing, see [Cost and quota considerations for Mistral family of models deployed as a service](#cost-and-quota-considerations-for-mistral-family-of-models-deployed-as-a-service). +#### Stream content -### Consume the Mistral family of models as a service +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. -You can consume Mistral models by using the chat API. +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. -1. From your Project overview page, go to the left sidebar and select Components > Deployments. -1. Find and select the deployment you created. +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` -1. Copy the Target URL and the Key value. +You can visualize how streaming generates content: -1. Make an API request using to either the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` and the native [Mistral Chat API](#mistral-chat-api) on `/v1/chat/completions`. -For more information on using the APIs, see the [reference](#reference-for-mistral-family-of-models-deployed-as-a-service) section. +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Mistral-Large", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` -## Reference for Mistral family of models deployed as a service +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. -Mistral models accept both the [Azure AI Model Inference API](../reference/reference-model-inference-api.md) on the route `/chat/completions` and the native [Mistral Chat API](#mistral-chat-api) on `/v1/chat/completions`. -### Azure AI Model Inference API +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Mistral-Large", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` -The [Azure AI Model Inference API](../reference/reference-model-inference-api.md) schema can be found in the [reference for Chat Completions](../reference/reference-model-inference-chat-completions.md) article and an [OpenAPI specification can be obtained from the endpoint itself](../reference/reference-model-inference-api.md?tabs=rest#getting-started). +#### Explore more parameters supported by the inference client -### Mistral Chat API +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). -Use the method `POST` to send the request to the `/v1/chat/completions` route: +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` -__Request__ -```rest -POST /v1/chat/completions HTTP/1.1 -Host: <DEPLOYMENT_URI> -Authorization: Bearer <TOKEN> -Content-type: application/json +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Mistral-Large", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} ``` -#### Request schema +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +#### Create JSON outputs + +Mistral premium chat models can create JSON outputs. Set `response_format` to `json_object` to enable JSON mode and guarantee that the message the model generates is valid JSON. You must also instruct the model to produce JSON yourself via a system or user message. Also, the message content might be partially cut off if `finish_reason="length"`, which indicates that the generation exceeded `max_tokens` or that the conversation exceeded the max context length. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that always generate responses in JSON format, using the following format: { \"answer\": \"response\" }" + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "response_format": { "type": "json_object" } +} +``` -Payload is a JSON formatted string containing the following parameters: -\| Key \| Type \| Default \| Description \| -\|--\|--\|--\|--\| -\| `messages` \| `string` \| No default. This value must be specified. \| The message or history of messages to use to prompt the model. \| -\| `stream` \| `boolean` \| `False` \| Streaming allows the generated tokens to be sent as data-only server-sent events whenever they become available. \| -\| `max_tokens` \| `integer` \| `8192` \| The maximum number of tokens to generate in the completion. The token count of your prompt plus `max_tokens` can't exceed the model's context length. \| -\| `top_p` \| `float` \| `1` \| An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with `top_p` probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering `top_p` or `temperature`, but not both. \| -\| `temperature` \| `float` \| `1` \| The sampling temperature to use, between 0 and 2. Higher values mean the model samples more broadly the distribution of tokens. Zero means greedy sampling. We recommend altering this parameter or `top_p`, but not both. \| -\| `ignore_eos` \| `boolean` \| `False` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| -\| `safe_prompt` \| `boolean` \| `False` \| Whether to inject a safety prompt before all conversations. \| +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718727522, + "model": "Mistral-Large", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "{\"answer\": \"There are approximately 7,117 living languages in the world today, according to the latest estimates. However, this number can vary as some languages become extinct and others are newly discovered or classified.\"}", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 39, + "total_tokens": 87, + "completion_tokens": 48 + } +} +``` -The `messages` object has the following fields: +### Pass extra parameters to the model -\| Key \| Type \| Value \| -\|--\|--\|\| -\| `content` \| `string` \| The contents of the message. Content is required for all messages. \| -\| `role` \| `string` \| The role of the message's author. One of `system`, `user`, or `assistant`. \| +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. -#### Request example +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` -__Body__ ```json { - "messages": - [ - { - "role": "system", - "content": "You are a helpful assistant that translates English to Italian." + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." }, { - "role": "user", - "content": "Translate the following sentence from English to Italian: I love programming." + "role": "user", + "content": "How many languages are in the world?" } ], - "temperature": 0.8, - "max_tokens": 512, + "logprobs": true } ``` -#### Response schema +The following extra parameters can be passed to Mistral premium chat models: -The response payload is a dictionary with the following fields: +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `ignore_eos` \| Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. \| `boolean` \| +\| `safe_mode` \| Whether to inject a safety prompt before all conversations. \| `boolean` \| -\| Key \| Type \| Description \| -\|--\|--\|-\| -\| `id` \| `string` \| A unique identifier for the completion. \| -\| `choices` \| `array` \| The list of completion choices the model generated for the input messages. \| -\| `created` \| `integer` \| The Unix timestamp (in seconds) of when the completion was created. \| -\| `model` \| `string` \| The model_id used for completion. \| -\| `object` \| `string` \| The object type, which is always `chat.completion`. \| -\| `usage` \| `object` \| Usage statistics for the completion request. \| -> [!TIP] -> In the streaming mode, for each chunk of response, `finish_reason` is always `null`, except from the last one which is terminated by a payload `[DONE]`. In each `choices` object, the key for `messages` is changed by `delta`. +### Safe mode + +Mistral premium chat models support the parameter `safe_prompt`. You can toggle the safe prompt to prepend your messages with the following system prompt: + +> Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. + +The Azure AI Model Inference API allows you to pass this extra parameter as follows: + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "safemode": true +} +``` + +### Use tools +Mistral premium chat models support the use of tools, which can be an extraordinary resource when you need to offload specific tasks from the language model and instead rely on a more deterministic system or even a different language model. The Azure AI Model Inference API allows you to define tools in the following way. -The `choices` object is a dictionary with the following fields: +The following code example creates a tool definition that is able to look from flight information from two different cities. -\| Key \| Type \| Description \| -\|\|--\|--\| -\| `index` \| `integer` \| Choice index. When `best_of` > 1, the index in this array might not be in order and might not be `0` to `n-1`. \| -\| `messages` or `delta` \| `string` \| Chat completion result in `messages` object. When streaming mode is used, `delta` key is used. \| -\| `finish_reason` \| `string` \| The reason the model stopped generating tokens: <br>- `stop`: model hit a natural stop point or a provided stop sequence. <br>- `length`: if max number of tokens have been reached. <br>- `content_filter`: When RAI moderates and CMP forces moderation <br>- `content_filter_error`: an error during moderation and wasn't able to make decision on the response <br>- `null`: API response still in progress or incomplete.\| -\| `logprobs` \| `object` \| The log probabilities of the generated tokens in the output text. \| +```json +{ + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters": { + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": [ + "origin_city", + "destination_city" + ] + } + } +} +``` -The `usage` object is a dictionary with the following fields: +In this example, the function's output is that there are no flights available for the selected route, but the user should consider taking a train. -\| Key \| Type \| Value \| -\|\|--\|--\| -\| `prompt_tokens` \| `integer` \| Number of tokens in the prompt. \| -\| `completion_tokens` \| `integer` \| Number of tokens generated in the completion. \| -\| `total_tokens` \| `integer` \| Total tokens. \| +Prompt the model to book flights with the help of this function: -The `logprobs` object is a dictionary with the following fields: -\| Key \| Type \| Value \| -\|\|-\|\| -\| `text_offsets` \| `array` of `integers` \| The position or index of each token in the completion output. \| -\| `token_logprobs` \| `array` of `float` \| Selected `logprobs` from dictionary in `top_logprobs` array. \| -\| `tokens` \| `array` of `string` \| Selected tokens. \| -\| `top_logprobs` \| `array` of `dictionary` \| Array of dictionary. In each dictionary, the key is the token and the value is the probability. \| +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that help users to find information about traveling, how to get to places and the different transportations options. You care about the environment and you always have that in mind when answering inqueries" + }, + { + "role": "user", + "content": "When is the next flight from Miami to Seattle?" + } + ], + "tool_choice": "auto", + "tools": [ + { + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters": { + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": [ + "origin_city", + "destination_city" + ] + } + } + } + ] +} +``` -#### Response example +You can inspect the response to find out if a tool needs to be called. Inspect the finish reason to determine if the tool should be called. Remember that multiple tool types can be indicated. This example demonstrates a tool of type `function`. -The following JSON is an example response: ```json { - "id": "12345678-1234-1234-1234-abcdefghijkl", + "id": "0a1234b5de6789f01gh2i345j6789klm", "object": "chat.completion", - "created": 2012359, - "model": "", + "created": 1718726007, + "model": "Mistral-Large", "choices": [ { "index": 0, - "finish_reason": "stop", "message": { "role": "assistant", - "content": "Sure, I\'d be happy to help! The translation of ""I love programming"" from English to Italian is:\n\n""Amo la programmazione.""\n\nHere\'s a breakdown of the translation:\n\n* ""I love"" in English becomes ""Amo"" in Italian.\n* ""programming"" in English becomes ""la programmazione"" in Italian.\n\nI hope that helps! Let me know if you have any other sentences you\'d like me to translate." - } + "content": "", + "tool_calls": [ + { + "id": "abc0dF1gh", + "type": "function", + "function": { + "name": "get_flight_info", + "arguments": "{\"origin_city\": \"Miami\", \"destination_city\": \"Seattle\"}", + "call_id": null + } + } + ] + }, + "finish_reason": "tool_calls", + "logprobs": null } ], "usage": { - "prompt_tokens": 10, - "total_tokens": 40, - "completion_tokens": 30 + "prompt_tokens": 190, + "total_tokens": 226, + "completion_tokens": 36 + } +} +``` + +To continue, append this message to the chat history: + +Now, it's time to call the appropriate function to handle the tool call. The following code snippet iterates over all the tool calls indicated in the response and calls the corresponding function with the appropriate parameters. The response is also appended to the chat history. + +View the response from the model: ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that help users to find information about traveling, how to get to places and the different transportations options. You care about the environment and you always have that in mind when answering inqueries" + }, + { + "role": "user", + "content": "When is the next flight from Miami to Seattle?" + }, + { + "role": "assistant", + "content": "", + "tool_calls": [ + { + "id": "abc0DeFgH", + "type": "function", + "function": { + "name": "get_flight_info", + "arguments": "{\"origin_city\": \"Miami\", \"destination_city\": \"Seattle\"}", + "call_id": null + } + } + ] + }, + { + "role": "tool", + "content": "{ \"info\": \"There are no flights available from Miami to Seattle. You should take a train, specially if it helps to reduce CO2 emissions.\" }", + "tool_call_id": "abc0DeFgH" + } + ], + "tool_choice": "auto", + "tools": [ + { + "type": "function", + "function": { + "name": "get_flight_info", + "description": "Returns information about the next flight between two cities. This includes the name of the airline, flight number and the date and time of the next flight", + "parameters":{ + "type": "object", + "properties": { + "origin_city": { + "type": "string", + "description": "The name of the city where the flight originates" + }, + "destination_city": { + "type": "string", + "description": "The flight destination city" + } + }, + "required": ["origin_city", "destination_city"] + } + } + } + ] +} +``` + +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are an AI assistant that helps people find information." + }, + { + "role": "user", + "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + } + ] +} +``` ++ +```json +{ + "error": { + "message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.", + "type": null, + "param": "prompt", + "code": "content_filter", + "status": 400 } } ``` -#### More inference examples +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + -\| Sample Type \| Sample Notebook \| -\|-\|-\| -\| CLI using CURL and Python web requests \| [webrequests.ipynb](https://aka.ms/mistral-large/webrequests-sample) \| -\| OpenAI SDK (experimental) \| [openaisdk.ipynb](https://aka.ms/mistral-large/openaisdk) \| -\| LangChain \| [langchain.ipynb](https://aka.ms/mistral-large/langchain-sample) \| -\| Mistral AI \| [mistralai.ipynb](https://aka.ms/mistral-large/mistralai-sample) \| -\| LiteLLM \| [litellm.ipynb](https://aka.ms/mistral-large/litellm-sample) \| +## More inference examples -## Cost and quotas +For more examples of how to use Mistral, see the following examples and tutorials: -### Cost and quota considerations for Mistral family of models deployed as a service +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| CURL request \| Bash \| [Link](https://aka.ms/mistral-large/webrequests-sample) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| Python web requests \| Python \| [Link](https://aka.ms/mistral-large/webrequests-sample) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/mistral-large/openaisdk) \| +\| LangChain \| Python \| [Link](https://aka.ms/mistral-large/langchain-sample) \| +\| Mistral AI \| Python \| [Link](https://aka.ms/mistral-large/mistralai-sample) \| +\| LiteLLM \| Python \| [Link](https://aka.ms/mistral-large/litellm-sample) \| -Mistral models deployed as a serverless API are offered by Mistral AI through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. -Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. +## Cost and quota considerations for Mistral family of models deployed as serverless API endpoints -For more information on how to track costs, see [monitor costs for models offered throughout the Azure Marketplace](./costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). +Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. -Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. +Mistral models deployed as a serverless API are offered by MistralAI through the Azure Marketplace and integrated with Azure AI Studio for use. You can find the Azure Marketplace pricing when deploying the model. -## Content filtering +Each time a project subscribes to a given offer from the Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference; however, multiple meters are available to track each scenario independently. -Models deployed as a serverless API with pay-as-you-go billing are protected by [Azure AI Content Safety](../../ai-services/content-safety/overview.md). With Azure AI content safety, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [content filtering here](../concepts/content-filtering.md). +For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace). ## Related content -- [What is Azure AI Studio?](../what-is-ai-studio.md)-- [Azure AI FAQ article](../faq.yml)-- [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)+ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Openai	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-openai.md	Alternatively, you can initiate deployment by starting from your project in AI S 1. Go to your project in AI Studio. 1. Select Components > Deployments. -1. Select + Create deployment. +1. Select + Deploy model. 1. In the Collections filter, select Azure OpenAI. 1. Select a model such as `gpt-4` from the Azure OpenAI collection. 1. Select Confirm to open the deployment window.
ai-studio	Deploy Models Phi 3 Vision	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-phi-3-vision.md	+ + Title: How to use Phi-3 chat models with vision with Azure AI Studio + +description: Learn how to use Phi-3 chat models with vision with Azure AI Studio. +++ Last updated : 08/08/2024+ +reviewer: fkriti +++ +zone_pivot_groups: azure-ai-model-catalog-samples-chat ++ +# How to use Phi-3 chat models with vision + +In this article, you learn about Phi-3 chat models with vision and how to use them. +The Phi-3 family of small language models (SLMs) is a collection of instruction-tuned generative text models. ++++ +Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. ++ +You can learn more about the models in their respective model card: + +* [Phi-3-vision-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-vision-128k-Instruct) ++ +## Prerequisites + +To use Phi-3 chat models with vision with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Phi-3 chat models with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models with vision. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.identity import DefaultAzureCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=DefaultAzureCredential(), +) +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```python +model_info = client.get_model_info() +``` + +The response is as follows: ++ +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: Phi-3-vision-128k-Instruct +Model type: chat-completions +Model provider name: Microsoft +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```python +from azure.ai.inference.models import SystemMessage, UserMessage + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` + +> [!NOTE] +> Phi-3-vision-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Phi-3-vision-128k-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +The following extra parameters can be passed to Phi-3 chat models with vision: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `object` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `bool` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `int` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +## Use chat completions with images + +Phi-3-vision-128k-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3-vision-128k-Instruct for vision in a chat fashion: + +> [!IMPORTANT] +> Phi-3-vision-128k-Instruct supports only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error. + +To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs): ++ +```python +from urllib.request import urlopen, Request +import base64 + +image_url = "https://news.microsoft.com/source/wp-content/uploads/2024/04/The-Phi-3-small-language-models-with-big-potential-1-1900x1069.jpg" +image_format = "jpeg" + +request = Request(image_url, headers={"User-Agent": "Mozilla/5.0"}) +image_data = base64.b64encode(urlopen(request).read()).decode("utf-8") +data_url = f"data:image/{image_format};base64,{image_data}" +``` + +Visualize the image: ++ +```python +import requests +import IPython.display as Disp + +Disp.Image(requests.get(image_url).content) +``` ++ +Now, create a chat completion request with the image: ++ +```python +from azure.ai.inference.models import TextContentItem, ImageContentItem, ImageUrl +response = client.complete( + messages=[ + SystemMessage("You are a helpful assistant that can generate responses based on images."), + UserMessage(content=[ + TextContentItem(text="Which conclusion can be extracted from the following chart?"), + ImageContentItem(image=ImageUrl(url=data_url)) + ]), + ], + temperature=0, + top_p=1, + max_tokens=2048, +) +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print(f"{response.choices[0].message.role}:\n\t{response.choices[0].message.content}\n") +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +``` + +```console +ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance. +Model: Phi-3-vision-128k-Instruct +Usage: + Prompt tokens: 2380 + Completion tokens: 126 + Total tokens: 2506 +``` ++++ +Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. ++ +You can learn more about the models in their respective model card: + +* [Phi-3-vision-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-vision-128k-Instruct) ++ +## Prerequisites + +To use Phi-3 chat models with vision with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Phi-3 chat models with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models with vision. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { DefaultAzureCredential } from "@azure/identity"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new DefaultAzureCredential() +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: Phi-3-vision-128k-Instruct +Model type: chat-completions +Model provider name: Microsoft +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +> [!NOTE] +> Phi-3-vision-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Phi-3-vision-128k-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +The following extra parameters can be passed to Phi-3 chat models with vision: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `object` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `bool` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `int` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +## Use chat completions with images + +Phi-3-vision-128k-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3-vision-128k-Instruct for vision in a chat fashion: + +> [!IMPORTANT] +> Phi-3-vision-128k-Instruct supports only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error. + +To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs): ++ +```javascript +const image_url = "https://news.microsoft.com/source/wp-content/uploads/2024/04/The-Phi-3-small-language-models-with-big-potential-1-1900x1069.jpg"; +const image_format = "jpeg"; + +const response = await fetch(image_url, { headers: { "User-Agent": "Mozilla/5.0" } }); +const image_data = await response.arrayBuffer(); +const image_data_base64 = Buffer.from(image_data).toString("base64"); +const data_url = `data:image/${image_format};base64,${image_data_base64}`; +``` + +Visualize the image: ++ +```javascript +const img = document.createElement("img"); +img.src = data_url; +document.body.appendChild(img); +``` ++ +Now, create a chat completion request with the image: ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant that can generate responses based on images." }, + { role: "user", content: + [ + { type: "text", text: "Which conclusion can be extracted from the following chart?" }, + { type: "image_url", image: + { + url: data_url + } + } + ] + } +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + temperature: 0, + top_p: 1, + max_tokens: 2048, + } +}); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +console.log(response.body.choices[0].message.role + ": " + response.body.choices[0].message.content); +console.log("Model:", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +``` + +```console +ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance. +Model: Phi-3-vision-128k-Instruct +Usage: + Prompt tokens: 2380 + Completion tokens: 126 + Total tokens: 2506 +``` ++++ +Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. ++ +You can learn more about the models in their respective model card: + +* [Phi-3-vision-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-vision-128k-Instruct) ++ +## Prerequisites + +To use Phi-3 chat models with vision with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Phi-3 chat models with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models with vision. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```csharp +client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new DefaultAzureCredential(includeInteractiveCredentials: true) +); +``` + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: Phi-3-vision-128k-Instruct +Model type: chat-completions +Model provider name: Microsoft +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +> [!NOTE] +> Phi-3-vision-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Phi-3-vision-128k-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` + +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. + +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` + +You can visualize how streaming generates content: ++ +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +The following extra parameters can be passed to Phi-3 chat models with vision: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `object` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `bool` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `int` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +## Use chat completions with images + +Phi-3-vision-128k-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3-vision-128k-Instruct for vision in a chat fashion: + +> [!IMPORTANT] +> Phi-3-vision-128k-Instruct supports only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error. + +To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs): ++ +```csharp +string imageUrl = "https://news.microsoft.com/source/wp-content/uploads/2024/04/The-Phi-3-small-language-models-with-big-potential-1-1900x1069.jpg"; +string imageFormat = "jpeg"; +HttpClient httpClient = new HttpClient(); +httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0"); +byte[] imageBytes = httpClient.GetByteArrayAsync(imageUrl).Result; +string imageBase64 = Convert.ToBase64String(imageBytes); +string dataUrl = $"data:image/{imageFormat};base64,{imageBase64}"; +``` + +Visualize the image: ++ +Now, create a chat completion request with the image: ++ +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage([ + new ChatMessageTextContentItem("Which conclusion can be extracted from the following chart?"), + new ChatMessageImageContentItem(new Uri(dataUrl)) + ]), + }, + MaxTokens=2048, +}; + +var response = client.Complete(requestOptions); +Console.WriteLine(response.Value.Choices[0].Message.Content); +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"{response.Value.Choices[0].Message.Role}: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +ASSISTANT: The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there might be other factors at play that influence a model's performance. +Model: Phi-3-vision-128k-Instruct +Usage: + Prompt tokens: 2380 + Completion tokens: 126 + Total tokens: 2506 +``` ++++ +Phi-3 Vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. ++ +You can learn more about the models in their respective model card: + +* [Phi-3-vision-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-vision-128k-Instruct) ++ +## Prerequisites + +To use Phi-3 chat models with vision with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to a self-hosted managed compute + +Phi-3 chat models with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models with vision. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: + +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` + +The response is as follows: ++ +```json +{ + "model_name": "Phi-3-vision-128k-Instruct", + "model_type": "chat-completions", + "model_provider_name": "Microsoft" +} +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` + +> [!NOTE] +> Phi-3-vision-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Phi-3-vision-128k-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` + +You can visualize how streaming generates content: ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Phi-3-vision-128k-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` + +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. ++ +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Phi-3-vision-128k-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Phi-3-vision-128k-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` + +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` + +The following extra parameters can be passed to Phi-3 chat models with vision: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `object` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `bool` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `int` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +## Use chat completions with images + +Phi-3-vision-128k-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3-vision-128k-Instruct for vision in a chat fashion: + +> [!IMPORTANT] +> Phi-3-vision-128k-Instruct supports only one image for each turn in the chat conversation and only the last image is retained in context. If you add multiple images, it results in an error. + +To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs): + +> [!TIP] +> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../media/how-to/sdks/slms-chart-example.jpg) in JPEG format. A data URL has a format as follows: `data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ...`. + +Visualize the image: ++ +Now, create a chat completion request with the image: ++ +```json +{ + "messages": [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Which peculiar conclusion about LLMs and SLMs can be extracted from the following chart?" + }, + { + "type": "image_url", + "image_url": { + "url": "data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ..." + } + } + ] + } + ], + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` + +The response is as follows, where you can see the model's usage statistics: ++ +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Phi-3-vision-128k-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "The chart illustrates that larger models tend to perform better in quality, as indicated by their size in billions of parameters. However, there are exceptions to this trend, such as Phi-3-medium and Phi-3-small, which outperform smaller models in quality. This suggests that while larger models generally have an advantage, there may be other factors at play that influence a model's performance.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 2380, + "completion_tokens": 126, + "total_tokens": 2506 + } +} +``` ++ +## More inference examples + +For more examples of how to use Phi-3, see the following examples and tutorials: + +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| CURL request \| Bash \| [Link](https://aka.ms/phi-3/webrequests-sample) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| Python web requests \| Python \| [Link](https://aka.ms/phi-3/webrequests-sample) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/phi-3/openaisdk) \| +\| LangChain \| Python \| [Link](https://aka.ms/phi-3/langchain-sample) \| +\| LiteLLM \| Python \| [Link](https://aka.ms/phi-3/litellm-sample) \| ++ +## Cost and quota considerations for Phi-3 family of models deployed to managed compute + +Phi-3 models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration. + +It is a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal. + +## Related content ++ +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Phi 3	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-phi-3.md	Title: How to deploy Phi-3 family of small language models with Azure AI Studio + Title: How to use Phi-3 chat models with Azure AI Studio -description: Learn how to deploy Phi-3 family of small language models with Azure AI Studio. - +description: Learn how to use Phi-3 chat models with Azure AI Studio. + Previously updated : 07/01/2024 Last updated : 08/08/2024 reviewer: fkriti -+ +zone_pivot_groups: azure-ai-model-catalog-samples-chat -# How to deploy Phi-3 family of small language models with Azure AI Studio +# How to use Phi-3 chat models + +In this article, you learn about Phi-3 chat models and how to use them. +The Phi-3 family of small language models (SLMs) is a collection of instruction-tuned generative text models. + -In this article, you learn about the Phi-3 family of small language models (SLMs). You also learn to use Azure AI Studio to deploy models from this family as serverless APIs with pay-as-you-go token-based billing. -The Phi-3 family of SLMs is a collection of instruction-tuned generative text models. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across various language, reasoning, coding, and math benchmarks. +## Phi-3 chat models -## Phi-3 family of models +The Phi-3 chat models include the following models: # [Phi-3-mini](#tab/phi-3-mini) -Phi-3 Mini is a 3.8B parameters, lightweight, state-of-the-art open model. Phi-3-Mini was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly-available websites data, with a focus on high quality and reasoning-dense properties. +Phi-3 Mini is a 3.8B parameters, lightweight, state-of-the-art open model. Phi-3-Mini was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Mini version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Mini-4K-Instruct and Phi-3-Mini-128K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-mini-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-4k-Instruct) +* [Phi-3-mini-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-128k-Instruct) ++ +# [Phi-3-small](#tab/phi-3-small) + +Phi-3 Medium is a 14B parameters, lightweight, state-of-the-art open model. Phi-3-Medium was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Medium version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Medium-4k-Instruct and Phi-3-Medium-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-small-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-4k-Instruct) +* [Phi-3-small-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-128k-Instruct) ++ +# [Phi-3-medium](#tab/phi-3-medium) + +Phi-3-Small is a 7B parameters, lightweight, state-of-the-art open model. Phi-3-Small was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Small version comes in two variants, 8K and 128K. The numbers (8K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Small-8k-Instruct and Phi-3-Small-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-medium-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-4k-Instruct) +* [Phi-3-medium-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-128k-Instruct) ++++ +## Prerequisites + +To use Phi-3 chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Phi-3 chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Phi-3 chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `azure-ai-inference` package with Python. To install this package, you need the following prerequisites: + +* Python 3.8 or later installed, including pip. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference package with the following command: + +```bash +pip install azure-ai-inference +``` + +Read more about the [Azure AI inference package and reference](https://aka.ms/azsdk/azure-ai-inference/python/reference). + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.core.credentials import AzureKeyCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]), +) +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```python +import os +from azure.ai.inference import ChatCompletionsClient +from azure.identity import DefaultAzureCredential + +client = ChatCompletionsClient( + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], + credential=DefaultAzureCredential(), +) +``` + +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```python +model_info = client.get_model_info() +``` + +The response is as follows: ++ +```python +print("Model name:", model_info.model_name) +print("Model type:", model_info.model_type) +print("Model provider name:", model_info.model_provider) +``` + +```console +Model name: Phi-3-mini-4k-Instruct +Model type: chat-completions +Model provider name: Microsoft +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```python +from azure.ai.inference.models import SystemMessage, UserMessage + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], +) +``` + +> [!NOTE] +> Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-4k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```python +print("Response:", response.choices[0].message.content) +print("Model:", response.model) +print("Usage:") +print("\tPrompt tokens:", response.usage.prompt_tokens) +print("\tTotal tokens:", response.usage.total_tokens) +print("\tCompletion tokens:", response.usage.completion_tokens) +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Phi-3-mini-4k-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```python +result = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + temperature=0, + top_p=1, + max_tokens=2048, + stream=True, +) +``` + +To stream completions, set `stream=True` when you call the model. + +To visualize the output, define a helper function to print the stream. + +```python +def print_stream(result): + """ + Prints the chat completion with streaming. Some delay is added to simulate + a real-time conversation. + """ + import time + for update in result: + if update.choices: + print(update.choices[0].delta.content, end="") + time.sleep(0.05) +``` + +You can visualize how streaming generates content: ++ +```python +print_stream(result) +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```python +from azure.ai.inference.models import ChatCompletionsResponseFormat + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + presence_penalty=0.1, + frequency_penalty=0.8, + max_tokens=2048, + stop=["<\|endoftext\|>"], + temperature=0, + top_p=1, + response_format={ "type": ChatCompletionsResponseFormat.TEXT }, +) +``` + +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```python +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content="How many languages are in the world?"), + ], + model_extras={ + "logprobs": True + } +) +``` + +The following extra parameters can be passed to Phi-3 chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. -The model belongs to the Phi-3 model family, and the Mini version comes in two variants, 4K and 128K, which denote the context length (in tokens) that each model variant can support. +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. -- [Phi-3-mini-4k-Instruct](https://ai.azure.com/explore/models/Phi-3-mini-4k-instruct/version/4/registry/azureml)-- [Phi-3-mini-128k-Instruct](https://ai.azure.com/explore/models/Phi-3-mini-128k-instruct/version/4/registry/azureml)+ +```python +from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage + +try: + response = client.complete( + messages=[ + SystemMessage(content="You are an AI assistant that helps people find information."), + UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), + ] + ) + + print(response.choices[0].message.content) + +except HttpResponseError as ex: + if ex.status_code == 400: + response = ex.response.json() + if isinstance(response, dict) and "error" in response: + print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}") + else: + raise + raise +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. ++++ +## Phi-3 chat models + +The Phi-3 chat models include the following models: + +# [Phi-3-mini](#tab/phi-3-mini) + +Phi-3 Mini is a 3.8B parameters, lightweight, state-of-the-art open model. Phi-3-Mini was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Mini version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Mini-4K-Instruct and Phi-3-Mini-128K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. + +The following models are available: + +* [Phi-3-mini-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-4k-Instruct) +* [Phi-3-mini-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-128k-Instruct) ++ # [Phi-3-small](#tab/phi-3-small) -Phi-3-Small is a 7B parameters, lightweight, state-of-the-art open model. Phi-3-Small was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly-available websites data, with a focus on high quality and reasoning-dense properties. +Phi-3 Medium is a 14B parameters, lightweight, state-of-the-art open model. Phi-3-Medium was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Medium version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. -The model belongs to the Phi-3 model family, and the Small version comes in two variants, 8K and 128K, which denote the context length (in tokens) that each model variant can support. +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Medium-4k-Instruct and Phi-3-Medium-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-small-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-4k-Instruct) +* [Phi-3-small-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-128k-Instruct) ++ +# [Phi-3-medium](#tab/phi-3-medium) -- Phi-3-small-8k-Instruct-- Phi-3-small-128k-Instruct +Phi-3-Small is a 7B parameters, lightweight, state-of-the-art open model. Phi-3-Small was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Small version comes in two variants, 8K and 128K. The numbers (8K and 128K) indicate the context length (in tokens) that each model variant can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Small-8k-Instruct and Phi-3-Small-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. + +The following models are available: + +* [Phi-3-medium-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-4k-Instruct) +* [Phi-3-medium-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-128k-Instruct) ++++ +## Prerequisites + +To use Phi-3 chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Phi-3 chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Phi-3 chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `@azure-rest/ai-inference` package from `npm`. To install this package, you need the following prerequisites: + +* LTS versions of `Node.js` with `npm`. +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure Inference library for JavaScript with the following command: + +```bash +npm install @azure-rest/ai-inference +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { AzureKeyCredential } from "@azure/core-auth"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```javascript +import ModelClient from "@azure-rest/ai-inference"; +import { isUnexpected } from "@azure-rest/ai-inference"; +import { DefaultAzureCredential } from "@azure/identity"; + +const client = new ModelClient( + process.env.AZURE_INFERENCE_ENDPOINT, + new DefaultAzureCredential() +); +``` + +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```javascript +var model_info = await client.path("/info").get() +``` + +The response is as follows: ++ +```javascript +console.log("Model name: ", model_info.body.model_name) +console.log("Model type: ", model_info.body.model_type) +console.log("Model provider name: ", model_info.body.model_provider_name) +``` + +```console +Model name: Phi-3-mini-4k-Instruct +Model type: chat-completions +Model provider name: Microsoft +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}); +``` + +> [!NOTE] +> Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-4k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```javascript +if (isUnexpected(response)) { + throw response.body.error; +} + +console.log("Response: ", response.body.choices[0].message.content); +console.log("Model: ", response.body.model); +console.log("Usage:"); +console.log("\tPrompt tokens:", response.body.usage.prompt_tokens); +console.log("\tTotal tokens:", response.body.usage.total_tokens); +console.log("\tCompletion tokens:", response.body.usage.completion_tokens); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Phi-3-mini-4k-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } +}).asNodeStream(); +``` + +To stream completions, use `.asNodeStream()` when you call the model. + +You can visualize how streaming generates content: ++ +```javascript +var stream = response.body; +if (!stream) { + stream.destroy(); + throw new Error(`Failed to get chat completions with status: ${response.status}`); +} + +if (response.status !== "200") { + throw new Error(`Failed to get chat completions: ${response.body.error}`); +} + +var sses = createSseStream(stream); + +for await (const event of sses) { + if (event.data === "[DONE]") { + return; + } + for (const choice of (JSON.parse(event.data)).choices) { + console.log(choice.delta?.content ?? ""); + } +} +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + presence_penalty: "0.1", + frequency_penalty: "0.8", + max_tokens: 2048, + stop: ["<\|endoftext\|>"], + temperature: 0, + top_p: 1, + response_format: { type: "text" }, + } +}); +``` + +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```javascript +var messages = [ + { role: "system", content: "You are a helpful assistant" }, + { role: "user", content: "How many languages are in the world?" }, +]; + +var response = await client.path("/chat/completions").post({ + headers: { + "extra-params": "pass-through" + }, + body: { + messages: messages, + logprobs: true + } +}); +``` + +The following extra parameters can be passed to Phi-3 chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```javascript +try { + var messages = [ + { role: "system", content: "You are an AI assistant that helps people find information." }, + { role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." }, + ]; + + var response = await client.path("/chat/completions").post({ + body: { + messages: messages, + } + }); + + console.log(response.body.choices[0].message.content); +} +catch (error) { + if (error.status_code == 400) { + var response = JSON.parse(error.response._content); + if (response.error) { + console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`); + } + else + { + throw error; + } + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. ++++ +## Phi-3 chat models + +The Phi-3 chat models include the following models: + +# [Phi-3-mini](#tab/phi-3-mini) + +Phi-3 Mini is a 3.8B parameters, lightweight, state-of-the-art open model. Phi-3-Mini was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Mini version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Mini-4K-Instruct and Phi-3-Mini-128K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-mini-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-4k-Instruct) +* [Phi-3-mini-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-128k-Instruct) ++ +# [Phi-3-small](#tab/phi-3-small) + +Phi-3 Medium is a 14B parameters, lightweight, state-of-the-art open model. Phi-3-Medium was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Medium version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Medium-4k-Instruct and Phi-3-Medium-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-small-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-4k-Instruct) +* [Phi-3-small-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-128k-Instruct) ++ # [Phi-3-medium](#tab/phi-3-medium) -Phi-3 Medium is a 14B parameters, lightweight, state-of-the-art open model. Phi-3-Medium was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly-available websites data, with a focus on high quality and reasoning-dense properties. -The model belongs to the Phi-3 model family, and the Medium version comes in two variants, 4K and 128K, which denote the context length (in tokens) that each model variant can support. +Phi-3-Small is a 7B parameters, lightweight, state-of-the-art open model. Phi-3-Small was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Small version comes in two variants, 8K and 128K. The numbers (8K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Small-8k-Instruct and Phi-3-Small-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-medium-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-4k-Instruct) +* [Phi-3-medium-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-128k-Instruct) ++++ +## Prerequisites + +To use Phi-3 chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Phi-3 chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Phi-3 chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### The inference package installed + +You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites: + +* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +Once you have these prerequisites, install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +You can also authenticate with Microsoft Entra ID (formerly Azure Active Directory). To use credential providers provided with the Azure SDK, install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: ++ +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +This example also use the following namespaces but you may not always need them: ++ +```csharp +using System.Text.Json; +using System.Text.Json.Serialization; +using System.Reflection; +``` + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. ++ +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. ++ +```csharp +client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new DefaultAzureCredential(includeInteractiveCredentials: true) +); +``` + +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. + +### Get the model's capabilities + +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: ++ +```csharp +Response<ModelInfo> modelInfo = client.GetModelInfo(); +``` + +The response is as follows: ++ +```csharp +Console.WriteLine($"Model name: {modelInfo.Value.ModelName}"); +Console.WriteLine($"Model type: {modelInfo.Value.ModelType}"); +Console.WriteLine($"Model provider name: {modelInfo.Value.ModelProviderName}"); +``` + +```console +Model name: Phi-3-mini-4k-Instruct +Model type: chat-completions +Model provider name: Microsoft +``` + +### Create a chat completion request + +The following example shows how you can create a basic chat completions request to the model. + +```csharp +ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, +}; + +Response<ChatCompletions> response = client.Complete(requestOptions); +``` + +> [!NOTE] +> Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-4k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. + +The response is as follows, where you can see the model's usage statistics: ++ +```csharp +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +Console.WriteLine($"Model: {response.Value.Model}"); +Console.WriteLine("Usage:"); +Console.WriteLine($"\tPrompt tokens: {response.Value.Usage.PromptTokens}"); +Console.WriteLine($"\tTotal tokens: {response.Value.Usage.TotalTokens}"); +Console.WriteLine($"\tCompletion tokens: {response.Value.Usage.CompletionTokens}"); +``` + +```console +Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred. +Model: Phi-3-mini-4k-Instruct +Usage: + Prompt tokens: 19 + Total tokens: 91 + Completion tokens: 72 +``` + +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. + +#### Stream content + +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. + +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. ++ +```csharp +static async Task StreamMessageAsync(ChatCompletionsClient client) +{ + ChatCompletionsOptions requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world? Write an essay about it.") + }, + MaxTokens=4096 + }; + + StreamingResponse<StreamingChatCompletionsUpdate> streamResponse = await client.CompleteStreamingAsync(requestOptions); + + await PrintStream(streamResponse); +} +``` -- Phi-3-medium-4k-Instruct-- Phi-3-medium-128k-Instruct +To stream completions, use `CompleteStreamingAsync` method when you call the model. Notice that in this example we the call is wrapped in an asynchronous method. + +To visualize the output, define an asynchronous method to print the stream in the console. + +```csharp +static async Task PrintStream(StreamingResponse<StreamingChatCompletionsUpdate> response) +{ + await foreach (StreamingChatCompletionsUpdate chatUpdate in response) + { + if (chatUpdate.Role.HasValue) + { + Console.Write($"{chatUpdate.Role.Value.ToString().ToUpperInvariant()}: "); + } + if (!string.IsNullOrEmpty(chatUpdate.ContentUpdate)) + { + Console.Write(chatUpdate.ContentUpdate); + } + } +} +``` + +You can visualize how streaming generates content: ++ +```csharp +StreamMessageAsync(client).GetAwaiter().GetResult(); +``` + +#### Explore more parameters supported by the inference client + +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + PresencePenalty = 0.1f, + FrequencyPenalty = 0.8f, + MaxTokens = 2048, + StopSequences = { "<\|endoftext\|>" }, + Temperature = 0, + NucleusSamplingFactor = 1, + ResponseFormat = new ChatCompletionsResponseFormatText() +}; + +response = client.Complete(requestOptions); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. ++ +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + +The following extra parameters can be passed to Phi-3 chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else + { + throw; + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. ++++ +## Phi-3 chat models + +The Phi-3 chat models include the following models: + +# [Phi-3-mini](#tab/phi-3-mini) + +Phi-3 Mini is a 3.8B parameters, lightweight, state-of-the-art open model. Phi-3-Mini was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Mini version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Mini-4K-Instruct and Phi-3-Mini-128K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-mini-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-4k-Instruct) +* [Phi-3-mini-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-mini-128k-Instruct) ++ +# [Phi-3-small](#tab/phi-3-small) + +Phi-3 Medium is a 14B parameters, lightweight, state-of-the-art open model. Phi-3-Medium was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Medium version comes in two variants, 4K and 128K. The numbers (4K and 128K) indicate the context length (in tokens) that each model variant can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Medium-4k-Instruct and Phi-3-Medium-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. + +The following models are available: + +* [Phi-3-small-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-4k-Instruct) +* [Phi-3-small-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-small-128k-Instruct) ++ +# [Phi-3-medium](#tab/phi-3-medium) + +Phi-3-Small is a 7B parameters, lightweight, state-of-the-art open model. Phi-3-Small was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. + +The model belongs to the Phi-3 model family, and the Small version comes in two variants, 8K and 128K. The numbers (8K and 128K) indicate the context length (in tokens) that each model variant can support. + +The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Small-8k-Instruct and Phi-3-Small-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters. ++ +The following models are available: + +* [Phi-3-medium-4k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-4k-Instruct) +* [Phi-3-medium-128k-Instruct](https://aka.ms/azureai/landing/Phi-3-medium-128k-Instruct) ++ -## Deploy Phi-3 models as serverless APIs +## Prerequisites + +To use Phi-3 chat models with Azure AI Studio, you need the following prerequisites: + +### A model deployment + +Deployment to serverless APIs + +Phi-3 chat models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md). + +> [!div class="nextstepaction"] +> [Deploy the model to serverless API endpoints](deploy-models-serverless.md) + +Deployment to a self-hosted managed compute + +Phi-3 chat models can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served. + +For deployment to a self-hosted managed compute, you must have enough quota in your subscription. If you don't have enough quota available, you can use our temporary quota access by selecting the option I want to use shared quota and I acknowledge that this endpoint will be deleted in 168 hours. + +> [!div class="nextstepaction"] +> [Deploy the model to managed compute](../concepts/deployments-overview.md) + +### A REST client + +Models deployed with the [Azure AI model inference API](https://aka.ms/azureai/modelinference) can be consumed using any REST client. To use the REST client, you need the following prerequisites: + +* To construct the requests, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name`` is your unique model deployment host name and `your-azure-region`` is the Azure region where the model is deployed (for example, eastus2). +* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string. + +## Work with chat completions + +In this section, you use the [Azure AI model inference API](https://aka.ms/azureai/modelinference) with a chat completions model for chat. + +> [!TIP] +> The [Azure AI model inference API](https://aka.ms/azureai/modelinference) allows you to talk with most models deployed in Azure AI Studio with the same code and structure, including Phi-3 chat models. + +### Create a client to consume the model + +First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables. + +When you deploy the model to a self-hosted online endpoint with Microsoft Entra ID support, you can use the following code snippet to create a client. -Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription. +> [!NOTE] +> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication. -### Prerequisites +### Get the model's capabilities -- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.-- An [Azure AI Studio hub](../how-to/create-azure-ai-resource.md). The serverless API model deployment offering for Phi-3 is only available with hubs created in these regions: +The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method: - * East US 2 - * Sweden Central +```http +GET /info HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +``` - For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md). +The response is as follows: -- An [Azure AI Studio project](../how-to/create-projects.md).-- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the steps in this article, your user account must be assigned the __Azure AI Developer role__ on the resource group. For more information on permissions, see [Role-based access control in Azure AI Studio](../concepts/rbac-ai-studio.md). +```json +{ + "model_name": "Phi-3-mini-4k-Instruct", + "model_type": "chat-completions", + "model_provider_name": "Microsoft" +} +``` -### Create a new deployment +### Create a chat completion request -To create a deployment: +The following example shows how you can create a basic chat completions request to the model. -1. Sign in to [Azure AI Studio](https://ai.azure.com). +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ] +} +``` -1. Select Model catalog from the left sidebar. +> [!NOTE] +> Phi-3-mini-4k-Instruct, Phi-3-mini-128k-Instruct, Phi-3-small-4k-Instruct, Phi-3-small-128k-Instruct and Phi-3-medium-128k-Instruct don't support system messages (`role="system"`). When you use the Azure AI model inference API, system messages are translated to user messages, which is the closest capability available. This translation is offered for convenience, but it's important for you to verify that the model is following the instructions in the system message with the right level of confidence. -1. Search for and select the model you want to deploy, for example Phi-3-mini-4k-Instruct, to open its Details page. +The response is as follows, where you can see the model's usage statistics: -1. Select Deploy. -1. Choose the option Serverless API to open a serverless API deployment window for the model. +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Phi-3-mini-4k-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` -1. Alternatively, you can initiate a deployment by starting from your project in AI Studio. +Inspect the `usage` section in the response to see the number of tokens used for the prompt, the total number of tokens generated, and the number of tokens used for the completion. - 1. From the left sidebar of your project, select Components > Deployments. - 1. Select + Create deployment. - 1. Search for and select Phi-3-mini-4k-Instruct to open the model's Details page. - 1. Select Confirm, and choose the option Serverless API to open a serverless API deployment window for the model. - -1. Select the project in which you want to deploy your model. To deploy the Phi-3 model, your project must belong to one of the regions listed in the [prerequisites](#prerequisites) section. +#### Stream content -1. Select the Pricing and terms tab to learn about pricing for the selected model. +By default, the completions API returns the entire generated content in a single response. If you're generating long completions, waiting for the response can take many seconds. -1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region. +You can _stream_ the content to get it as it's being generated. Streaming content allows you to start processing the completion as content becomes available. This mode returns an object that streams back the response as [data-only server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). Extract chunks from the delta field, rather than the message field. -1. Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. This step requires that your account has the Azure AI Developer role permissions on the Resource Group, as listed in the prerequisites. -1. Select Open in playground to start interacting with the model. +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "stream": true, + "temperature": 0, + "top_p": 1, + "max_tokens": 2048 +} +``` -1. Return to the Deployments page, select the deployment, and note the endpoint's Target URL and the Secret Key, which you can use to call the deployment and generate completions. For more information on using the APIs, see [Reference: Chat Completions](../reference/reference-model-inference-chat-completions.md). +You can visualize how streaming generates content: -1. You can always find the endpoint's details, URL, and access keys by navigating to your Project overview page. Then, from the left sidebar of your project, select Components > Deployments. +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Phi-3-mini-4k-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "role": "assistant", + "content": "" + }, + "finish_reason": null, + "logprobs": null + } + ] +} +``` -### Consume Phi-3 models as a service +The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop. -Models deployed as serverless APIs can be consumed using the chat API, depending on the type of model you deployed. -1. From your Project overview page, go to the left sidebar and select Components > Deployments. +```json +{ + "id": "23b54589eba14564ad8a2e6978775a39", + "object": "chat.completion.chunk", + "created": 1718726371, + "model": "Phi-3-mini-4k-Instruct", + "choices": [ + { + "index": 0, + "delta": { + "content": "" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` -1. Find and select the deployment you created. +#### Explore more parameters supported by the inference client -1. Copy the Target URL and the Key value. +Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference). -1. Make an API request using the `/v1/chat/completions` API using `<target_url>/v1/chat/completions`. For more information on using the APIs, see the [Reference: Chat Completions](../reference/reference-model-inference-chat-completions.md). +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "presence_penalty": 0.1, + "frequency_penalty": 0.8, + "max_tokens": 2048, + "stop": ["<\|endoftext\|>"], + "temperature" :0, + "top_p": 1, + "response_format": { "type": "text" } +} +``` -## Cost and quotas -### Cost and quota considerations for Phi-3 models deployed as serverless APIs +```json +{ + "id": "0a1234b5de6789f01gh2i345j6789klm", + "object": "chat.completion", + "created": 1718726686, + "model": "Phi-3-mini-4k-Instruct", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.", + "tool_calls": null + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 19, + "total_tokens": 91, + "completion_tokens": 72 + } +} +``` -You can find the pricing information on the Pricing and terms tab of the deployment wizard when deploying the model. +> [!WARNING] +> Phi-3 doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON. + +If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using extra parameters. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model). + +### Pass extra parameters to the model + +The Azure AI Model Inference API allows you to pass extra parameters to the model. The following code example shows how to pass the extra parameter `logprobs` to the model. + +Before you pass extra parameters to the Azure AI model inference API, make sure your model supports those extra parameters. When the request is made to the underlying model, the header `extra-parameters` is passed to the model with the value `pass-through`. This value tells the endpoint to pass the extra parameters to the model. Use of extra parameters with the model doesn't guarantee that the model can actually handle them. Read the model's documentation to understand which extra parameters are supported. + +```http +POST /chat/completions HTTP/1.1 +Host: <ENDPOINT_URI> +Authorization: Bearer <TOKEN> +Content-Type: application/json +extra-parameters: pass-through +``` ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "How many languages are in the world?" + } + ], + "logprobs": true +} +``` + +The following extra parameters can be passed to Phi-3 chat models: + +\| Name \| Description \| Type \| +\| -- \| \| \| +\| `logit_bias` \| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. \| `float` \| +\| `logprobs` \| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. \| `int` \| +\| `top_logprobs` \| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. \| `float` \| +\| `n` \| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. \| `int` \| ++ +### Apply content safety + +The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. + +The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled. ++ +```json +{ + "messages": [ + { + "role": "system", + "content": "You are an AI assistant that helps people find information." + }, + { + "role": "user", + "content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + } + ] +} +``` ++ +```json +{ + "error": { + "message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.", + "type": null, + "param": "prompt", + "code": "content_filter", + "status": 400 + } +} +``` + +> [!TIP] +> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety). + +> [!NOTE] +> Azure AI content safety is only available for models deployed as serverless API endpoints. ++ +## More inference examples + +For more examples of how to use Phi-3, see the following examples and tutorials: + +\| Description \| Language \| Sample \| +\|-\|-\|--\| +\| CURL request \| Bash \| [Link](https://aka.ms/phi-3/webrequests-sample) \| +\| Azure AI Inference package for JavaScript \| JavaScript \| [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) \| +\| Azure AI Inference package for Python \| Python \| [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) \| +\| Python web requests \| Python \| [Link](https://aka.ms/phi-3/webrequests-sample) \| +\| OpenAI SDK (experimental) \| Python \| [Link](https://aka.ms/phi-3/openaisdk) \| +\| LangChain \| Python \| [Link](https://aka.ms/phi-3/langchain-sample) \| +\| LiteLLM \| Python \| [Link](https://aka.ms/phi-3/litellm-sample) \| ++ +## Cost and quota considerations for Phi-3 family of models deployed as serverless API endpoints Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios. +## Cost and quota considerations for Phi-3 family of models deployed to managed compute + +Phi-3 models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration. +It is a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal. ## Related content -- [What is Azure AI Studio?](../what-is-ai-studio.md)-- [Azure AI FAQ article](../faq.yml)-- [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Azure AI Model Inference API](../reference/reference-model-inference-api.md) +* [Deploy models as serverless APIs](deploy-models-serverless.md) +* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md) +* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md) +* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
ai-studio	Deploy Models Serverless Availability	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-serverless-availability.md	Certain models in the model catalog can be deployed as a serverless API with pay ## Region availability +Pay-as-you-go billing is available only to users whose Azure subscription belongs to a billing account in a country where the model provider has made the offer available (see "offer availability region" in the table in the next section). If the offer is available in the relevant region, the user then must have a Hub/Project in the Azure region where the model is available for deployment or fine-tuning, as applicable (see "Hub/Project Region" columns in the following tables). ++ ## Alternatives to region availability
ai-studio	Deploy Models Serverless	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-serverless.md	In this article, you learn how to deploy a model from the model catalog as a ser [Certain models in the model catalog](deploy-models-serverless-availability.md) can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription. +This article uses a Meta Llama model deployment for illustration. However, you can use the same steps to deploy any of the [models in the model catalog that are available for serverless API deployment](deploy-models-serverless-availability.md). + ## Prerequisites - An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin. The next section covers the steps for subscribing your project to a model offeri ## Subscribe your project to the model offering -Serverless API endpoints can deploy both Microsoft and non-Microsoft offered models. For Microsoft models (such as Phi-3 models), you don't need to create an Azure Marketplace subscription and you can [deploy them to serverless API endpoints directly](#deploy-the-model-to-a-serverless-api-endpoint) to consume their predictions. For non-Microsoft models, you need to create the subscription first. If it's your first time deploying the model in the workspace, you have to subscribe your workspace for the particular model offering from the Azure Marketplace. Each workspace has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. +Serverless API endpoints can deploy both Microsoft and non-Microsoft offered models. For Microsoft models (such as Phi-3 models), you don't need to create an Azure Marketplace subscription and you can [deploy them to serverless API endpoints directly](#deploy-the-model-to-a-serverless-api-endpoint) to consume their predictions. For non-Microsoft models, you need to create the subscription first. If it's your first time deploying the model in the project, you have to subscribe your project for the particular model offering from the Azure Marketplace. Each project has its own subscription to the particular Azure Marketplace offering of the model, which allows you to control and monitor spending. > [!NOTE] > Models offered through the Azure Marketplace are available for deployment to serverless API endpoints in specific regions. Check [Model and region availability for Serverless API deployments](deploy-models-serverless-availability.md) to verify which models and regions are available. If the one you need is not listed, you can deploy to a workspace in a supported region and then [consume serverless API endpoints from a different workspace](deploy-models-serverless-connect.md). Serverless API endpoints can deploy both Microsoft and non-Microsoft offered mod # [AI Studio](#tab/azure-ai-studio) - 1. On the model's Details page, select Deploy and then select Serverless API with Azure AI Content Safety (preview) to open the deployment wizard. + 1. On the model's Details page, select Deploy. A Deployment options window opens up, giving you the choice between serverless API deployment and deployment using a managed compute. + + > [!NOTE] + > For models that can be deployed only via serverless API deployment, the serverless API deployment wizard opens up right after you select Deploy from the model's details page. + 1. Select Serverless API with Azure AI Content Safety (preview) to open the serverless API deployment wizard. 1. Select the project in which you want to deploy your models. To use the serverless API model deployment offering, your project must belong to one of the [regions that are supported for serverless deployment](deploy-models-serverless-availability.md) for the particular model. :::image type="content" source="../media/deploy-monitor/serverless/deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the serverless API option." lightbox="../media/deploy-monitor/serverless/deploy-pay-as-you-go.png":::
ai-studio	Deploy Models Timegen 1	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/deploy-models-timegen-1.md	These steps demonstrate the deployment of TimeGEN-1. To create a deployment: 1. Select Deploy to open a serverless API deployment window for the model. 1. Alternatively, you can initiate a deployment by starting from your project in AI Studio. 1. From the left sidebar of your project, select Components > Deployments. - 1. Select + Create deployment. + 1. Select + Deploy model. 1. Search for and select TimeGEN-1. to open the Model's Details page. 1. Select Confirm to open a serverless API deployment window for the model. 1. Select the project in which you want to deploy your model. To deploy the TimeGEN-1 model, your project must be in one of the regions listed in the [Prerequisites](#prerequisites) section.
ai-studio	Fine Tune Model Llama	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/fine-tune-model-llama.md	To fine-tune a LLama 3.1 model: > [!NOTE] > If you have your training/validation files in a credential less datastore, you will need to allow workspace managed identity access to their datastore in order to proceed with MaaS finetuning with a credential less storage. On the "Datastore" page, after clicking "Update authentication" > Select the following option: - - ![Use workspace managed identity for data preview and profiling in Azure Machine Learning Studio.](../media/how-to/fine-tune/llama/credentials.png) + + ![Use workspace managed identity for data preview and profiling in Azure Machine Learning Studio.](../media/how-to/fine-tune/llama/credentials.png) Make sure all your training examples follow the expected format for inference. To fine-tune models effectively, ensure a balanced and diverse dataset. This involves maintaining data balance, including various scenarios, and periodically refining training data to align with real-world expectations, ultimately leading to more accurate and balanced model responses. - The batch size to use for training. When set to -1, batch_size is calculated as 0.2% of examples in training set and the max is 256. To fine-tune a Llama 2 model in an existing Azure AI Studio project, follow thes :::image type="content" source="../media/how-to/fine-tune/llama/llama-pay-as-you-go-overview.png" alt-text="Screenshot of pay-as-you-go marketplace overview." lightbox="../media/how-to/fine-tune/llama/llama-pay-as-you-go-overview.png"::: -1. Choose a base model to fine-tune and select Confirm. Your choice influences both the performance and [the cost of your model](./deploy-models-llama.md#cost-and-quotas). +1. Choose a base model to fine-tune and select Confirm. Your choice influences both the performance and [the cost of your model](./deploy-models-llama.md#cost-and-quota-considerations-for-meta-llama-family-of-models-deployed-as-serverless-api-endpoints). :::image type="content" source="../media/how-to/fine-tune/llama/fine-tune-select-model.png" alt-text="Screenshot of option to select a model to fine-tune." lightbox="../media/how-to/fine-tune/llama/fine-tune-select-model.png":::
ai-studio	Index Add	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/index-add.md	- build-2024 Last updated 5/21/2024-+
ai-studio	Model Catalog Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/ai-studio/how-to/model-catalog-overview.md	In Azure AI Studio, you can use vector indexes and retrieval-augmented generatio ### Regional availability of offers and models -Pay-as-you-go billing is available only if your Azure subscription belongs to a billing account in a country where the model provider has made the offer available. If the offer is available in the relevant region, you must have a hub or project in the Azure region where the model is available for deployment or fine-tuning, as applicable. - -<!-- docutune:disable --> - -Model \| Offer availability region \| Hub/Project region for deployment \| Hub/project region for fine-tuning \|--\|--\|-- -Llama-3-70B-Instruct <br> Llama-3-8B-Instruct \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US, East US 2, North Central US, South Central US, Sweden Central, West US, West US 3 \| Not available -Llama-2-7b <br> Llama-2-13b <br> Llama-2-70b \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US, East US 2, North Central US, South Central US, West US, West US 3 \| West US 3 -Llama-2-7b-chat <br> Llama-2-13b-chat <br> Llama-2-70b-chat \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US, East US 2, North Central US, South Central US, West US, West US 3 \| Not available -Mistral Small \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US, East US 2, North Central US, South Central US, Sweden Central, West US, West US 3 \| Not available -Mistral Large (2402) <br> Mistral-Large (2407) \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) <br> Brazil <br> Hong Kong <br> Israel\| East US, East US 2, North Central US, South Central US, Sweden Central, West US, West US 3 \| Not available -Mistral NeMo \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) <br> Brazil <br> Hong Kong <br> Israel\| East US, East US 2, North Central US, South Central US, Sweden Central, West US, West US 3 \| Not available -Cohere-command-r-plus <br> Cohere-command-r <br> Cohere-embed-v3-english <br> Cohere-embed-v3-multilingual \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) <br> Japan \| East US, East US 2, North Central US, South Central US, Sweden Central, West US, West US 3 \| Not available -TimeGEN-1 \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) <br> Mexico <br> Israel\| East US, East US 2, North Central US, South Central US, Sweden Central, West US, West US 3 \| Not available -jais-30b-chat \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US, East US 2, North Central US, South Central US, Sweden Central, West US, West US 3 \| Not available -Phi-3-mini-4k-instruct <br> Phi-3-mini-128k-instruct \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US 2, Sweden Central \| Not available -Phi-3-small-8k-instruct <br> Phi-3-small-128k-Instruct \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US 2, Sweden Central \| Not available -Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-instruct \| [Microsoft-managed countries/regions](/partner-center/marketplace/tax-details-marketplace#microsoft-managed-countriesregions) \| East US 2, Sweden Central \| Not available - -<!-- docutune:enable --> +Pay-as-you-go billing is available only to users whose Azure subscription belongs to a billing account in a country where the model provider has made the offer available. If the offer is available in the relevant region, the user then must have a Hub/Project in the Azure region where the model is available for deployment or fine-tuning, as applicable. See [Region availability for models in serverless API endpoints \| Azure AI Studio](deploy-models-serverless-availability.md) for detailed information. ### Content safety for models deployed via serverless APIs
analysis-services	Analysis Services Async Refresh	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/analysis-services/analysis-services-async-refresh.md	https://westus.asazure.windows.net/servers/myserver/models/AdventureWorks/refres All calls must be authenticated with a valid Microsoft Entra ID (OAuth 2) token in the Authorization header and must meet the following requirements: - The token must be either a user token or an application service principal.-- The token must have the audience set to exactly `https://.asazure.windows.net`. Note that `` isn't a placeholder or a wildcard, and the audience must have the `` character as the subdomain. Custom audiences, such as https://customersubdomain.asazure.windows.net, are not supported. Specifying an invalid audience results in authentication failure. +- The token must have the audience set to exactly `https://.asazure.windows.net`. Note that `` isn't a placeholder or a wildcard, and the audience must have the `` character as the subdomain. Custom audiences, such as `https://customersubdomain.asazure.windows.net`, are not supported. Specifying an invalid audience results in authentication failure. - The user or application must have sufficient permissions on the server or model to make the requested call. The permission level is determined by roles within the model or the admin group on the server. > [!IMPORTANT]
api-management	Azure Openai Token Limit Policy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/azure-openai-token-limit-policy.md	By relying on token usage metrics returned from the OpenAI endpoint, the policy \| -- \| -- \| -- \| - \| \| counter-key \| The key to use for the token limit policy. For each key value, a single counter is used for all scopes at which the policy is configured. Policy expressions are allowed.\| Yes \| N/A \| \| tokens-per-minute \| The maximum number of tokens consumed by prompt and completion per minute. \| Yes \| N/A \| -\| estimate-prompt-tokens \| Boolean value that determines whether to estimate the number of tokens required for a prompt: <br> - `true`: estimate the number of tokens based on prompt schema in API; may reduce performance. <br> - `false`: don't estimate prompt tokens. \| Yes \| N/A \| +\| estimate-prompt-tokens \| Boolean value that determines whether to estimate the number of tokens required for a prompt: <br> - `true`: estimate the number of tokens based on prompt schema in API; may reduce performance. <br> - `false`: don't estimate prompt tokens. <br><br>When set to `false`, the remaining tokens per `counter-key` are calculated using the actual token usage from the response of the model. This could result in prompts being sent to the model that exceed the token limit. In such case, this will be detected in the response, and all succeeding requests will be blocked by the policy until the token limit frees up again. \| Yes \| N/A \| \| retry-after-header-name \| The name of a custom response header whose value is the recommended retry interval in seconds after the specified `tokens-per-minute` is exceeded. Policy expressions aren't allowed. \| No \| `Retry-After` \| \| retry-after-variable-name \| The name of a variable that stores the recommended retry interval in seconds after the specified `tokens-per-minute` is exceeded. Policy expressions aren't allowed. \| No \| N/A \| \| remaining-tokens-header-name \| The name of a response header whose value after each policy execution is the number of remaining tokens allowed for the time interval. Policy expressions aren't allowed.\| No \| N/A \|
api-management	Analytics Dashboard Retirement March 2027	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/breaking-changes/analytics-dashboard-retirement-march-2027.md	Title: Azure API Management - Analytics dashboard retirement (March 2027) -description: Azure API Management is replacing the built-in API analytics dashboard as of March 2027. An equivalent dashboard is available based on an Azure Monitor Workbook. +description: Azure API Management is replacing the built-in API analytics dashboard as of March 2027. A similar Azure Monitor-based dashboard is available. Previously updated : 05/16/2024 Last updated : 08/05/2024 Effective 15 March 2027, the dashboard and reports associated with API Managemen ## Is my service affected by this? -The Premium, Standard, Basic, and Developer tiers of API Management provide a [built-in analytics dashboard](../howto-use-analytics.md#legacy-built-in-analytics) and reports ("legacy built-in analytics") for each API Management instance. As of early 2024, API Management also provides identical API analytics using an [Azure Monitor-based dashboard](../howto-use-analytics.md#azure-monitor-based-dashboard) that aggregates data in an Azure Log Analytics workspace. The Azure Monitor-based dashboard is now the recommended solution for viewing API Management analytics data in the portal. +The Premium, Standard, Basic, and Developer tiers of API Management provide a [built-in analytics dashboard](../howto-use-analytics.md#legacy-built-in-analytics) and reports ("legacy built-in analytics" or "classic built-in analytics") for each API Management instance. As of early 2024, API Management also provides closely similar API analytics using an [Azure Monitor-based dashboard](../howto-use-analytics.md#azure-monitor-based-dashboard) that aggregates data in an Azure Log Analytics workspace. The Azure Monitor-based dashboard is now the recommended solution for viewing API Management analytics data in the portal. ## What is the deadline for the change? After 15 March 2027, the legacy built-in analytics will not be supported. Both the legacy built-in analytics and Azure Monitor-based analytics will be available in parallel for a period of time to allow for a smooth transition to the Azure Monitor-based implementation. ## What do I need to do? -From now through 15 March 2027, you will have access to both implementations in the Azure portal, featuring identical dashboards and reports. We recommend that you start using the Azure Monitor-based dashboard to view your API Management analytics data. To use the Azure Monitor-based dashboard, configure a Log Analytics workspace as a data source for API Management gateway logs. For more information, see [Tutorial: Monitor published APIs](../api-management-howto-use-azure-monitor.md). +From now through 15 March 2027, you will have access to both implementations in the Azure portal, featuring closely similar dashboards and reports. We recommend that you start using the Azure Monitor-based dashboard to view your API Management analytics data. To use the Azure Monitor-based dashboard, configure a Log Analytics workspace as a data source for API Management gateway logs. For more information, see [Tutorial: Monitor published APIs](../api-management-howto-use-azure-monitor.md). When an option to do so becomes available, we recommend disabling the legacy analytics dashboard to improve the capacity performance of your API Management instance. We recommend pursuing this change proactively before the retirement date.
api-management	Howto Use Analytics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/howto-use-analytics.md	Title: Use API analytics in Azure API Management \| Microsoft Docs -description: Use analytics in Azure API Management to understand and categorize the usage of your APIs and API performance. Analytics is provided using an Azure workbook. +description: Use analytics in Azure API Management to understand and categorize the usage of APIs and API performance. Analytics uses an Azure Monitor-based dashboard. Previously updated : 03/26/2024 Last updated : 08/05/2024 Azure API Management provides analytics for your APIs so that you can analyze th * API Management provides analytics using an [Azure Monitor-based dashboard](../azure-monitor/visualize/workbooks-overview.md). The dashboard aggregates data in an Azure Log Analytics workspace. -* In the classic API Management service tiers, your API Management instance also includes legacy built-in analytics in the Azure portal, and analytics data can be accessed using the API Management REST API. Equivalent data is shown in the Azure Monitor-based dashboard and built-in analytics. +* In the classic API Management service tiers, your API Management instance also includes legacy built-in analytics in the Azure portal, and analytics data can be accessed using the API Management REST API. Closely similar data is shown in the Azure Monitor-based dashboard and built-in analytics. > [!IMPORTANT] > * The Azure Monitor-based dashboard is the recommended way to access analytics data. -> * Legacy built-in analytics isn't available in the v2 tiers. +> * Built-in (classic) analytics isn't available in the v2 tiers. With API analytics, analyze the usage and performance of the APIs in your API Management instance across several dimensions, including: With API analytics, analyze the usage and performance of the APIs in your API Ma ## Azure Monitor-based dashboard -To use the Azure Monitor-based dashboard, you need to configure a Log Analytics workspace as a data source for API Management gateway logs. +To use the Azure Monitor-based dashboard, you need a Log Analytics workspace as a data source for API Management gateway logs. If you need to configure one, the following are brief steps to send gateway logs to a Log Analytics workspace. For more information, see [Tutorial: Monitor published APIs](api-management-howto-use-azure-monitor.md#resource-logs). This is a one-time setup. If you need to configure one, the following are brief steps to send gateway logs After a Log Analytics workspace is configured, access the Azure Monitor-based dashboard to analyze the usage and performance of your APIs. 1. In the [Azure portal](https://portal.azure.com), navigate to your API Management instance. -1. In the left-hand menu, under Monitoring, select Insights. The analytics dashboard opens. +1. In the left-hand menu, under Monitoring, select Analytics. The analytics dashboard opens. 1. Select a time range for data. 1. Select a report category for analytics data, such as Timeline, Geography, and so on. ## Legacy built-in analytics -In certain API Management service tiers, built-in analytics is also available in the Azure portal, and analytics data can be accessed using the API Management REST API. +In certain API Management service tiers, built-in analytics (also called legacy analytics or classic analytics) is also available in the Azure portal, and analytics data can be accessed using the API Management REST API. ### Built-in analytics - portal -To access the built-in analytics in the Azure portal: +To access the built-in (classic) analytics in the Azure portal: 1. In the [Azure portal](https://portal.azure.com), navigate to your API Management instance. -1. In the left-hand menu, under Monitoring, select Analytics. +1. In the left-hand menu, under Monitoring, select Analytics (classic). 1. Select a time range for data, or enter a custom time range. 1. Select a report category for analytics data, such as Timeline, Geography, and so on. -1. Optionally, filter the report by one or more additional categories. +1. Optionally, filter the report by one or more other categories. ### Analytics - REST API Available operations return report records by API, geography, API operations, pr * For an introduction to Azure Monitor features in API Management, see [Tutorial: Monitor published APIs](api-management-howto-use-azure-monitor.md) * For detailed HTTP logging and monitoring, see [Monitor your APIs with Azure API Management, Event Hubs, and Moesif](api-management-log-to-eventhub-sample.md). -* Learn about integrating [Azure API Management with Azure Application Insights](api-management-howto-app-insights.md). +* Learn about integrating [Azure API Management with Azure Application Insights](api-management-howto-app-insights.md). +* Learn about [Built-in API analytics dashboard retirement (March 2027)](breaking-changes/analytics-dashboard-retirement-march-2027.md)
app-service	Configure Authentication Api Version	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/configure-authentication-api-version.md	The following steps will allow you to manually migrate the application to the V2 "tokenStoreEnabled": true, "allowedExternalRedirectUrls": null, "defaultProvider": "AzureActiveDirectory", - "clientId": "3197c8ed-2470-480a-8fae-58c25558ac9b", + "clientId": "00001111-aaaa-2222-bbbb-3333cccc4444", "clientSecret": "", "clientSecretSettingName": "MICROSOFT_IDENTITY_AUTHENTICATION_SECRET", "clientSecretCertificateThumbprint": null,
app-service	Configure Authentication Provider Aad	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/configure-authentication-provider-aad.md	Either: - Select Pick an existing app registration in this directory and select an app registration from the drop-down. - Select Provide the details of an existing app registration and provide: - Application (client) ID. - - Client secret (recommended). A secret value that the application uses to prove its identity when requesting a token. This value is saved in your app's configuration as a slot-sticky application setting named `MICROSOFT_PROVIDER_AUTHENTICATION_SECRET`. If the client secret isn't set, sign-in operations from the service use the OAuth 2.0 implicit grant flow, which isn't* recommended. + - Client secret (recommended). A secret value that the application uses to prove its identity when requesting a token. This value is saved in your app's configuration as a slot-sticky application setting named `MICROSOFT_PROVIDER_AUTHENTICATION_SECRET`. If the client secret isn't set, sign-in operations from the service use the OAuth 2.0 implicit grant flow, which isn't recommended. - Issuer URL, which takes the form `<authentication-endpoint>/<tenant-id>/v2.0`. Replace `<authentication-endpoint>` with the authentication endpoint [value specific to the cloud environment](/entra/identity-platform/authentication-national-cloud#azure-ad-authentication-endpoints). For example, a workforce tenant in global Azure would use "https://login.microsoftonline.com" as its authentication endpoint. If you need to manually create an app registration in a workforce tenant, follow the [register an application](/entra/identity-platform/quickstart-register-app) quickstart. As you go through the registration process, be sure to note the application (client) ID and client secret values. Within the API object, the Microsoft Entra identity provider configuration has a \| `allowedPrincipals` \| A grouping of checks that determine if the principal represented by the incoming request may access the app. Satisfaction of `allowedPrincipals` is based on a logical `OR` over its configured properties. \| \| `identities` (under `allowedPrincipals`) \| An allowlist of string object IDs representing users or applications that have access. When this property is configured as a nonempty array, the `allowedPrincipals` requirement can be satisfied if the user or application represented by the request is specified in the list. There's a limit of 500 characters total across the list of identities.<br/><br/>This policy evaluates the `oid` claim of the incoming token. See the [Microsoft identity platform claims reference]. \| -Additionally, some checks can be configured through an [application setting], regardless of the API version being used. The `WEBSITE_AUTH_AAD_ALLOWED_TENANTS` application setting can be configured with a comma-separated list of up to 10 tenant IDs (for example, "559a2f9c-c6f2-4d31-b8d6-5ad1a13f8330,5693f64a-3ad5-4be7-b846-e9d1141bcebc") to require that the incoming token is from one of the specified tenants, as specified by the `tid` claim. The `WEBSITE_AUTH_AAD_REQUIRE_CLIENT_SERVICE_PRINCIPAL` application setting can be configured to "true" or "1" to require the incoming token to include an `oid` claim. This setting is ignored and treated as true if `allowedPrincipals.identities` has been configured (since the `oid` claim is checked against this provided list of identities). +Additionally, some checks can be configured through an [application setting], regardless of the API version being used. The `WEBSITE_AUTH_AAD_ALLOWED_TENANTS` application setting can be configured with a comma-separated list of up to 10 tenant IDs (for example, "aaaabbbb-0000-cccc-1111-dddd2222eeee") to require that the incoming token is from one of the specified tenants, as specified by the `tid` claim. The `WEBSITE_AUTH_AAD_REQUIRE_CLIENT_SERVICE_PRINCIPAL` application setting can be configured to "true" or "1" to require the incoming token to include an `oid` claim. This setting is ignored and treated as true if `allowedPrincipals.identities` has been configured (since the `oid` claim is checked against this provided list of identities). Requests that fail these built-in checks are given an HTTP `403 Forbidden` response.
app-service	Configure Common	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/configure-common.md	Other language stacks, likewise, get the app settings as environment variables a App settings are always encrypted when stored (encrypted-at-rest). > [!NOTE] -> App settings can also be resolved from [Key Vault](/azure/key-vault/) using [Key Vault references](app-service-key-vault-references.md). +> If you store secrets in app settings, consider using [Key Vault references](app-service-key-vault-references.md). If your secrets are for connectivity to back-end resources, consider more secure connectivity options that don't require secrets at all. For more information, see [Secure connectivity to Azure services and databases from Azure App Service](tutorial-connect-overview.md). # [Azure portal](#tab/portal) It's not possible to edit app settings in bulk by using a JSON file with Azure P ## Configure connection strings +> [!NOTE] +> Consider more secure connectivity options that don't require connection secrets at all. For more information, see [Secure connectivity to Azure services and databases from Azure App Service](tutorial-connect-overview.md). + For ASP.NET and ASP.NET Core developers, setting connection strings in App Service are like setting them in `<connectionStrings>` in Web.config, but the values you set in App Service override the ones in Web.config. You can keep development settings (for example, a database file) in Web.config and production secrets (for example, SQL Database credentials) safely in App Service. The same code uses your development settings when you debug locally, and it uses your production secrets when deployed to Azure. For other language stacks, it's better to use [app settings](#configure-app-settings) instead, because connection strings require special formatting in the variable keys in order to access the values.
app-service	Configure Language Dotnet Framework	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/configure-language-dotnet-framework.md	ConfigurationManager.ConnectionStrings["MyConnection"]; If you configure an app setting with the same name in App Service and in web.config, the App Service value takes precedence over the web.config value. The local web.config value lets you debug the app locally, but the App Service value lets your run the app in product with production settings. Connection strings work in the same way. This way, you can keep your application secrets outside of your code repository and access the appropriate values without changing your code. +> [!NOTE] +> Consider more secure connectivity options that don't require connection secrets at all. For more information, see [Secure connectivity to Azure services and databases from Azure App Service](tutorial-connect-overview.md). ++ ## Deploy multi-project solutions When a Visual Studio solution includes multiple projects, the Visual Studio publish process already includes selecting the project to deploy. When you deploy to the App Service deployment engine, such as with Git, or with ZIP deploy [with build automation enabled](deploy-zip.md#enable-build-automation-for-zip-deploy), the App Service deployment engine picks the first Web Site or Web Application Project it finds as the App Service app. You can specify which project App Service should use by specifying the `PROJECT` app setting. For example, run the following in the [Cloud Shell](https://shell.azure.com):
app-service	Configure Language Dotnetcore	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/configure-language-dotnetcore.md	namespace SomeNamespace If you configure an app setting with the same name in App Service and in appsettings.json, for example, the App Service value takes precedence over the appsettings.json value. The local appsettings.json value lets you debug the app locally, but the App Service value lets you run the app in production with production settings. Connection strings work in the same way. This way, you can keep your application secrets outside of your code repository and access the appropriate values without changing your code. +> [!NOTE] +> Consider more secure connectivity options that don't require connection secrets at all. For more information, see [Secure connectivity to Azure services and databases from Azure App Service](tutorial-connect-overview.md). + ::: zone pivot="platform-linux" > [!NOTE] > Note the [hierarchical configuration data](/aspnet/core/fundamentals/configuration/#hierarchical-configuration-data) in appsettings.json is accessed using the `__` (double underscore) delimiter that's standard on Linux to .NET Core. To override a specific hierarchical configuration setting in App Service, set the app setting name with the same delimited format in the key. you can run the following example in the [Cloud Shell](https://shell.azure.com):
app-service	Overview Managed Identity	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/overview-managed-identity.md	Content-Type: application/json "expires_on": "1586984735", "resource": "https://vault.azure.net", "token_type": "Bearer", - "client_id": "5E29463D-71DA-4FE0-8E69-999B57DB23B0" + "client_id": "00001111-aaaa-2222-bbbb-3333cccc4444" } ```
app-service	Tutorial Auth Aad	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/tutorial-auth-aad.md	In the tutorial, you learn: > * Access a remote app on behalf of the signed-in user > * Secure service-to-service calls with token authentication > * Use access tokens from server code -> * Use access tokens from client (browser) code > [!TIP] > After completing this scenario, continue to the next procedure to learn how to connect to Azure services as an authenticated user. Common scenarios include accessing Azure Storage or a database as the user who has specific abilities or access to specific tables or files. Because the frontend app calls the backend app from server source code, this isn Your access token expires after some time. For information on how to refresh your access tokens without requiring users to reauthenticate with your app, see [Refresh identity provider tokens](configure-authentication-oauth-tokens.md#refresh-auth-tokens). +### If I have a browser-based app on the front-end app, can it talk to the back end directly? + +This approach requires the server code to pass the access token to the JavaScript code running in the client browser. Because there's no way to safeguard the access token in the browser, it's not a recommended approach. Currently, the [Backend-for-Frontend pattern](https://auth0.com/blog/the-backend-for-frontend-pattern-bff/) is recommended. If applied to the example in this tutorial, the browser code on the front-end app would make API calls in an authenticated session to its server code as an intermediary, and the server code on the front-end app would in-turn make the API calls to the back-end app by using the `x-ms-token-aad-access-token` header value as the bearer token. All calls from your browser code to the server code would be protected by the authenticated session already. <a name="next"></a> ## Next steps What you learned: > * Access a remote app on behalf of the signed-in user > * Secure service-to-service calls with token authentication > * Use access tokens from server code -> * Use access tokens from client (browser) code Advance to the next tutorial to learn how to use this user's identity to access an Azure service.
app-service	Tutorial Connect Msi Sql Database	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/app-service/tutorial-connect-msi-sql-database.md	Here's an example of the output: <pre> { "additionalProperties": {}, - "principalId": "21dfa71c-9e6f-4d17-9e90-1d28801c9735", - "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47", + "principalId": "aaaaaaaa-bbbb-cccc-1111-222222222222", + "tenantId": "aaaabbbb-0000-cccc-1111-dddd2222eeee", "type": "SystemAssigned" } </pre>
azure-app-configuration	Quickstart Azure Kubernetes Service	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-app-configuration/quickstart-azure-kubernetes-service.md	If the phase is not `COMPLETE`, the data isn't downloaded from your App Configur kubectl logs deployment/az-appconfig-k8s-provider -n azappconfig-system ``` -Use the logs for further troubleshooting. For example, if you see requests to your App Configuration store are responded with RESPONSE 403: 403 Forbidden, it may indicate the App Configuration Kubernetes Provider doesn't have the necessary permission to access your App Configuration store. Follow the instructions in [use workload identity](./reference-kubernetes-provider.md#use-workload-identity) to ensure associated managed identity is assigned proper permission. +Use the logs for further troubleshooting. Refer to the [FAQ](#faq) section for common issues. + +## FAQ + +#### Why isnΓÇÖt the ConfigMap or Secret being generated? + +You can follow the steps in the [Troubleshooting](#troubleshooting) guide to collect logs for detailed error information. Here are some common causes. + +- RESPONSE 403: 403 Forbidden: The configured identity lacks the necessary permissions to access the App Configuration store. Refer to the [Authentication](./reference-kubernetes-provider.md#authentication) section for examples that match the identity you are using. +- A Key Vault reference is found in App Configuration, but 'spec.secret' was not configured: One or more Key Vault references are included in the selected key-values, but the authentication information for Key Vaults is not provided. To maintain the integrity of the configuration, the entire configuration fails to load. Configure the `spec.secret` section to provide the necessary authentication information. For examples and more information, see [Key Vault reference](./reference-kubernetes-provider.md#key-vault-references) . + +#### Why does the generated ConfigMap not contain the expected data? + +Ensure that you specify the correct key-value selectors to match the expected data. If no selectors are specified, all key-values without a label will be downloaded from your App Configuration store. When using a key filter, verify that it matches the prefix of your expected key-values. If your key-values have labels, make sure to specify the label filter in the selectors. For more examples, refer to the [key-value selection](./reference-kubernetes-provider.md#key-value-selection) documentation. + +#### How can I customize the installation of the Azure App Configuration Kubernetes Provider? + +You can customize the installation by providing additional Helm values when installing the Azure App Configuration Kubernetes Provider. For example, you can set the log level, configure the provider to run on a specific node, or disable the workload identity. Refer to the [installation guide](./reference-kubernetes-provider.md#installation) for more information. ## Clean up resources
azure-arc	Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-arc/vmware-vsphere/overview.md	Azure Arc-enabled VMware vSphere doesn't store/process customer data outside the ## Azure Kubernetes Service (AKS) Arc on VMware (preview) -Starting in March 2024, Azure Kubernetes Service (AKS) enabled by Azure Arc on VMware is available for preview. AKS Arc on VMware enables you to use Azure Arc to create new Kubernetes clusters on VMware vSphere. For more information, see [What is AKS enabled by Arc on VMware?](/azure/aks/hybrid/aks-vmware-overview). +Starting March 2024, Azure Kubernetes Service (AKS) enabled by Azure Arc on VMware is available for preview. AKS Arc on VMware enables you to use Azure Arc to create new Kubernetes clusters on VMware vSphere. For more information, see [What is AKS enabled by Arc on VMware?](/azure/aks/hybrid/aks-vmware-overview). The following capabilities are available in the AKS Arc on VMware preview:
azure-boost	Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-boost/overview.md	Title: Overview of Azure Boost description: Learn more about how Azure Boost can Learn more about how Azure Boost can improve security and performance of your virtual machines. -+ - ignite-2023
azure-cache-for-redis	Cache How To Zone Redundancy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-cache-for-redis/cache-how-to-zone-redundancy.md	To create a cache, follow these steps: - [Why can't I enable zone redundancy when creating a Premium cache?](#why-cant-i-enable-zone-redundancy-when-creating-a-premium-cache) - [Why can't I select all three zones during cache create?](#why-cant-i-select-all-three-zones-during-cache-create)-- Can I update my existing Standard or Premium cache to use zone redundancy? +- Can I update my existing Standard or Premium cache to use zone redundancy?] - [How much does it cost to replicate my data across Azure Availability Zones?](#how-much-does-it-cost-to-replicate-my-data-across-azure-availability-zones) ### Why can't I enable zone redundancy when creating a Premium cache?
azure-functions	Functions Bindings Cache Input	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-functions/functions-bindings-cache-input.md	Previously updated : 05/20/2024 Last updated : 07/12/2024 zone_pivot_groups: programming-languages-set-functions-lang-workers
azure-functions	Functions Bindings Cache Output	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-functions/functions-bindings-cache-output.md	Previously updated : 02/27/2024 Last updated : 07/12/2024 # Azure Cache for Redis output binding for Azure Functions The following example shows a pub/sub trigger on the set event with an output bi > >For .NET functions, using the _isolated worker_ model is recommended over the _in-process_ model. For a comparison of the _in-process_ and _isolated worker_ models, see differences between the _isolated worker_ model and the _in-process_ model for .NET on Azure Functions. -### [In-process](#tab/in-process) +### [Isolated process](#tab/isolated-process) ```c#+ using Microsoft.Extensions.Logging; -namespace Microsoft.Azure.WebJobs.Extensions.Redis.Samples.RedisOutputBinding +namespace Microsoft.Azure.Functions.Worker.Extensions.Redis.Samples.RedisOutputBinding { internal class SetDeleter { - [FunctionName(nameof(SetDeleter))] - public static void Run( - [RedisPubSubTrigger(Common.connectionStringSetting, "__keyevent@0__:set")] string key, - [Redis(Common.connectionStringSetting, "DEL")] out string[] arguments, + [Function(nameof(SetDeleter))] + [RedisOutput(Common.connectionString, "DEL")] + public static string Run( + [RedisPubSubTrigger(Common.connectionString, "__keyevent@0__:set")] string key, ILogger logger) { logger.LogInformation($"Deleting recently SET key '{key}'"); - arguments = new string[] { key }; + return key; } } } ``` -### [Isolated process](#tab/isolated-process) ++ +### [In-process](#tab/in-process) ```csharp -∩╗┐using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Logging; namespace Microsoft.Azure.WebJobs.Extensions.Redis.Samples.RedisOutputBinding { internal class SetDeleter { [FunctionName(nameof(SetDeleter))] - [return: Redis(Common.connectionStringSetting, "DEL")] - public static string Run( + public static void Run( [RedisPubSubTrigger(Common.connectionStringSetting, "__keyevent@0__:set")] string key, + [Redis(Common.connectionStringSetting, "DEL")] out string[] arguments, ILogger logger) { logger.LogInformation($"Deleting recently SET key '{key}'"); - return key; + arguments = new string[] { key }; } } }
azure-functions	Functions Bindings Cache Trigger Redislist	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-functions/functions-bindings-cache-trigger-redislist.md	Previously updated : 05/20/2024 Last updated : 07/12/2024 # RedisListTrigger for Azure Functions
azure-functions	Functions Bindings Cache Trigger Redispubsub	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-functions/functions-bindings-cache-trigger-redispubsub.md	Previously updated : 05/20/2024 Last updated : 07/12/2024 # RedisPubSubTrigger for Azure Functions public class SimplePubSubTrigger { @RedisPubSubTrigger( name = "req", connection = "redisConnectionString", - channel = "pubsubTest") + channel = "pubsubTest", + pattern = false) String message, final ExecutionContext context) { context.getLogger().info(message); } }++ ``` This sample listens to any keyspace notifications for the key `myKey`. public class KeyspaceTrigger { @RedisPubSubTrigger( name = "req", connection = "redisConnectionString", - channel = "__keyspace@0__:keyspaceTest") + channel = "__keyspace@0__:keyspaceTest", + pattern = false) String message, final ExecutionContext context) { context.getLogger().info(message); } }+ ``` This sample listens to any `keyevent` notifications for the delete command [`DEL`](https://redis.io/commands/del/). public class KeyeventTrigger { @RedisPubSubTrigger( name = "req", connection = "redisConnectionString", - channel = "__keyevent@0__:del") + channel = "__keyevent@0__:del", + pattern = false) String message, final ExecutionContext context) { context.getLogger().info(message); Here's binding data to listen to the channel `pubsubTest`. "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "pubsubTest", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to keyspace notifications for the key `keyspaceTes "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "__keyspace@0__:keyspaceTest", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to `keyevent` notifications for the delete command "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "__keyevent@0__:del", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to the channel `pubsubTest`. "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "pubsubTest", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to keyspace notifications for the key `keyspaceTes "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "__keyspace@0__:keyspaceTest", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to `keyevent` notifications for the delete command "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "__keyevent@0__:del", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to the channel `pubsubTest`. "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "pubsubTest", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to keyspace notifications for the key `keyspaceTes "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "__keyspace@0__:keyspaceTest", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to `keyevent` notifications for the delete command "type": "redisPubSubTrigger", "connection": "redisConnectionString", "channel": "__keyevent@0__:del", + "pattern": false, "name": "message", "direction": "in" } Here's binding data to listen to `keyevent` notifications for the delete command ## Configuration -\| function.json property \| Description \| Required \| Default \| -\|\|--\| :--:\| --:\| -\| `type` \| Trigger type. For the pub sub trigger, the type is `redisPubSubTrigger`. \| Yes \| \| -\| `connection` \| The name of the [application setting](functions-how-to-use-azure-function-app-settings.md#settings) that contains the cache connection string, such as: `<cacheName>.redis.cache.windows.net:6380,password...`\| Yes \| \| -\| `channel` \| Name of the pub sub channel that is being subscribed to \| Yes \| \| -\| `name` \| Name of the variable holding the value returned by the function. \| Yes \| \| -\| `direction` \| Must be set to `in`. \| Yes \| \| +\| function.json property \| Description \| Required \| Default \| +\| - \| -- \|:--:\| -:\| +\| `type` \| Trigger type. For the pub sub trigger, the type is `redisPubSubTrigger`. \| Yes \| \| +\| `connection` \| The name of the [application setting](functions-how-to-use-azure-function-app-settings.md#settings) that contains the cache connection string, such as: `<cacheName>.redis.cache.windows.net:6380,password...` \| Yes \| \| +\| `channel` \| Name of the pub sub channel that is being subscribed to. \| Yes \| \| +\| `pattern` \| A boolean to indicate the given channel uses pattern mathching. If `pattern` is true, then the channel is treated like a _glob-style_ pattern instead of as a literal. \| Yes \| \| +\| `name` \| Name of the variable holding the value returned by the function. \| Yes \| \| +\| `direction` \| Must be set to `in`. \| Yes \| \| ::: zone-end
azure-functions	Functions Bindings Cache Trigger Redisstream	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-functions/functions-bindings-cache-trigger-redisstream.md	Previously updated : 05/20/2024 Last updated : 07/12/2024 # RedisStreamTrigger for Azure Functions The `RedisStreamTrigger` reads new entries from a stream and surfaces those elem ### [Isolated worker model](#tab/isolated-process) - ```csharp ∩╗┐using Microsoft.Extensions.Logging;
azure-functions	Functions Bindings Cache	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-functions/functions-bindings-cache.md	Title: Using Azure Functions for Azure Cache for Redis (preview) + Title: Using Azure Functions for Azure Cache for Redis description: Learn how to use Azure Functions Azure Cache for Redis zone_pivot_groups: programming-languages-set-functions-lang-workers Previously updated : 03/01/2024 Last updated : 07/11/2024 -# Overview of Azure functions for Azure Cache for Redis (preview) +# Overview of Azure functions for Azure Cache for Redis This article describes how to use Azure Cache for Redis with Azure Functions to create optimized serverless and event-driven architectures.
azure-health-insights	Use Containers	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-health-insights/use-containers.md	Azure AI Health Insights container images can be found on the `mcr.microsoft.com To use the latest version of the container, you can use the `latest` tag. You can find a full list of tags on the MCR via `https://mcr.microsoft.com/v2/azure-cognitive-services/health-insights/clinical-matching/tags/list` and `https://mcr.microsoft.com/v2/azure-cognitive-services/health-insights/cancer-profiling/tags/list`. -- Use the [`docker pull`](https://docs.docker.com/engine/reference/commandline/pull/) command to download this container image from the Microsoft public container registry. You can find the featured tags on the [docker hub clinical matching page](https://hub.docker.com/_/microsoft-azure-cognitive-services-health-insights-clinical-matching) and [docker hub cancer profiling page](https://hub.docker.com/_/microsoft-azure-cognitive-services-health-insights-cancer-profiling) - ``` docker pull mcr.microsoft.com/azure-cognitive-services/health-insights/<model-name>:<tag-name> ```
azure-monitor	Azure Monitor Agent Manage	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/agents/azure-monitor-agent-manage.md	The AgentSettings DCR currently supports configuring the following parameters: \| Parameter \| Description \| Valid values \| \| \| -- \| -- \| -\| `DiscQuotaInMb` \| Defines the amount of disk space used by the Azure Monitor Agent log files and cache. \| 1-50 (in GB) \| + +\| `DiscQuotaInMb` \| Defines the amount of disk space used by the Azure Monitor Agent log files and cache. \| 1,000-50,000 (or 1-50 GB) \| \| `TimeReceivedForForwardedEvents` \| Changes WEF column in the Sentinel WEF table to use TimeReceived instead of TimeGenerated data \| 0 or 1 \| ### Setting up AgentSettings DCR
azure-monitor	Azure Web Apps Net Core	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/app/azure-web-apps-net-core.md	Enabling monitoring on your ASP.NET Core-based web applications running on [Azure App Service](../../app-service/index.yml) is now easier than ever. Previously, you needed to manually instrument your app. Now, the latest extension/agent is built into the App Service image by default. This article walks you through enabling Azure Monitor Application Insights monitoring. It also provides preliminary guidance for automating the process for large-scale deployments. + [!INCLUDE [azure-monitor-log-analytics-rebrand](~/reusable-content/ce-skilling/azure/includes/azure-monitor-instrumentation-key-deprecation.md)] ## Enable autoinstrumentation monitoring
azure-monitor	Container Insights Cost	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-cost.md	Title: Monitoring cost for Container insights \| Microsoft Docs + Title: Monitoring cost for Container insights description: This article describes the monitoring cost for metrics and inventory data collected by Container insights to help customers manage their usage and associated costs. Last updated 03/02/2023 -# Understand monitoring costs for Container insights +# Optimize monitoring costs for Container insights -This article provides pricing guidance for Container insights to help you understand how to: +Kubernetes clusters generate a large amount of data that's collected by Container insights. Since you're charged for the ingestion and retention of this data, you want to configure your environment to optimize your costs. You can significantly reduce your monitoring costs by filtering out data that you don't need and also by optimizing the configuration of the Log Analytics workspace where you're storing your data. -* Measure costs after Container insights has been enabled for one or more containers. -* Control the collection of data and make cost reductions. +Once you've analyzed your collected data and determined if there's any data that you're collecting that you don't require, there are several options to filter any data that you don't want to collect. This ranges from selecting from a set of predefined cost configurations to leveraging different features to filter data based on specific criteria. This article provides a walkthrough of guidance on how to analyze and optimize your data collection for Container insights. -The Azure Monitor pricing model is primarily based on the amount of data ingested in gigabytes per day into your Log Analytics workspace. The cost of a Log Analytics workspace isn't based only on the volume of data collected, it is also dependent on the plan selected, and how long you chose to store data generated from your clusters. +## Analyze your data ingestion ->[!NOTE] -> See [Estimate Azure Monitor costs](../cost-estimate.md#log-data-ingestion) to estimate your costs for Container insights before you enable it. +To identify your best opportunities for cost savings, analyze the amount of data being collected in different tables. This information will help you identify which tables are consuming the most data and help you make informed decisions about how to reduce costs. -The following types of data collected from a Kubernetes cluster with Container insights influence cost and can be customized based on your usage: +You can visualize how much data is ingested in each workspace by using the Container Insights Usage runbook, which is available from the Workspace page of a monitored cluster. -- Perf, Inventory, InsightsMetrics, and KubeEvents can be controlled through [cost optimization settings](../containers/container-insights-cost-config.md)-- Stdout and stderr container logs from every monitored container in every Kubernetes namespace in the cluster via the [agent ConfigMap](../containers/container-insights-data-collection-configmap.md)-- Container environment variables from every monitored container in the cluster-- Completed Kubernetes jobs/pods in the cluster that don't require monitoring-- Active scraping of Prometheus metrics-- [Resource log collection](/azure/aks/monitor-aks#aks-control-planeresource-logs) of Kubernetes main node logs in your Azure Kubernetes Service (AKS) cluster to analyze log data generated by main components, such as `kube-apiserver` and `kube-controller-manager`. +The report will let you view the data usage by different categories such as table, namespace, and log source. Use these different views to determine any data that you're not using and can be filtered out to reduce costs. -## Control ingestion to reduce cost -Consider a scenario where your organization's different business units share Kubernetes infrastructure and a Log Analytics workspace. Each business unit is separated by a Kubernetes namespace. You can visualize how much data is ingested in each workspace by using the Data Usage runbook. The runbook is available from the Reports tab. +Select the option to open the query in Log Analytics where you can perform more detailed analysis including viewing the individual records being collected. See [Query logs from Container insights +](./container-insights-log-query.md) for additional queries you can use to analyze your collected data. +For example, the following screenshot shows a modification to the log query used for By Table that shows the data by namespace and table. -This workbook helps you visualize the source of your data without having to build your own library of queries from what we share in our documentation. In this workbook, you can view charts that present billable data such as the: -- Total billable data ingested in GB by solution.-- Billable data ingested by Container logs (application logs).-- Billable container logs data ingested by Kubernetes namespace.-- Billable container logs data ingested segregated by Cluster name.-- Billable container log data ingested by log source entry.-- Billable diagnostic data ingested by diagnostic main node logs. +## Filter collected data +Once you've identified data that you can filter, use different configuration options in Container insights to filter out data that you don't require. Options are available to select predefined configurations, set individual parameters, and use custom log queries for detailed filtering. +### Cost presets +The simplest way to filter data is using the cost presets in the Azure portal. Each preset includes different sets of tables to collect based on different operation and cost profiles. The cost presets are designed to help you quickly configure your data collection based on common scenarios. -To learn about managing rights and permissions to the workbook, review [Access control](../visualize/workbooks-overview.md#access-control). -### Determining the root cause of the data ingestion +> [!TIP] +> If you've configured your cluster to use the Prometheus experience for Container insights, then you can disable Performance collection since performance data is being collected by Prometheus. -Container Insights data primarily consists of metric counters (Perf, Inventory, InsightsMetrics, and custom metrics) and logs (ContainerLog). Based on your cluster usage and size, you may have different requirements and monitoring needs. +For details on selecting a cost preset, see [Configure DCR with Azure portal](./container-insights-data-collection-configure.md#configure-dcr-with-azure-portal) -By navigating to the By Table section of the Data Usage workbook, you can see the breakdown of table sizes for Container Insights. +### Filtering options +After you've chosen an appropriate cost preset, you can filter additional data using the different methods in the following table. Each option will allow you to filter data based on different criteria. When you're done with your configuration, you should only be collecting data that you require for analysis and alerting. +\| Filter by \| Description \| +\|:\|:--\| +\| Tables \| Manually modify the DCR if you want to select individual tables to populate other than the cost preset groups. For example, you may want to collect ContainerLogV2 but not collect KubeEvents which is included in the same cost preset. <br><br>See [Stream values in DCR](./container-insights-data-collection-configure.md#stream-values-in-dcr) for a list of the streams to use in the DCR and use the guidance in . \| +\| Container logs \| `ContainerLogV2` stores the stdout/stderr records generated by the containers in the cluster. While you can disable collection of the entire table using the DCR, you can configure the collection of stderr and stdout logs separately using the ConfigMap for the cluster. Since `stdout` and `stderr` settings can be configured separately, you can choose to enable one and not the other.<br><br>See [Filter container logs](./container-insights-data-collection-filter.md#filter-container-logs) for details on filtering container logs. \| +\| Namespace \| Namespaces in Kubernetes are used to group resources within a cluster. You can filter out data from resources in specific namespaces that you don't require. Using the DCR, you can only filter performance data by namespace, if you've enabled collection for the `Perf` table. Use ConfigMap to filter data for particular namespaces in `stdout` and `stderr` logs.<br><br>See [Filter container logs](./container-insights-data-collection-filter.md#filter-container-logs) for details on filtering logs by namespace and [Platform log filtering (System Kubernetes Namespaces)](./container-insights-data-collection-filter.md#platform-log-filtering-system-kubernetes-namespaces) for details on the system namespace. \| +\| Pods and containers \| Annotation filtering allows you to filter out container logs based on annotations that you make to the pod. Using the ConfigMap you can specify whether stdout and stderr logs should be collected for individual pods and containers.<br><br>See [Annotation based filtering for workloads](./container-insights-data-collection-filter.md#annotation-based-filtering-for-workloads) for details on updating your ConfigMap and on setting annotations in your pods. \| -If the majority of your data comes from one of these following tables: -- Perf-- InsightsMetrics-- ContainerInventory-- ContainerNodeInventory -- KubeNodeInventory-- KubePodInventory-- KubePVInventory-- KubeServices-- KubeEvents -You can adjust your ingestion using the [cost optimization settings](../containers/container-insights-cost-config.md) and/or migrating to the [Prometheus metrics addon](container-insights-prometheus.md) +## Transformations +[Ingestion time transformations](../essentials/data-collection-transformations.md) allow you to apply a KQL query to filter and transform data in the [Azure Monitor pipeline](../essentials/pipeline-overview.md) before it's stored in the Log Analytics workspace. This allows you to filter data based on criteria that you can't perform with the other options. -Otherwise, the majority of your data belongs to the ContainerLog table and you can follow the steps below to reduce your ContainerLog costs. +For example, you may choose to filter container logs based on the log level in ContainerLogV2. You could add a transformation to your Container insights DCR that would perform the functionality in the following diagram. In this example, only `error` and `critical` level events are collected, while any other events are ignored. -### Reducing your ContainerLog costs +An alternate strategy would be to save the less important events to a separate table configured for basic logs. The events would still be available for troubleshooting, but with a significant cost savings for data ingestion. -After you finish your analysis to determine which sources are generating the data that's exceeding your requirements, you can reconfigure data collection. For more information on configuring collection of stdout, stderr, and environmental variables, see [Configure agent data collection settings](container-insights-data-collection-configmap.md). +See [Data transformations in Container insights](./container-insights-transformations.md) for details on adding a transformation to your Container insights DCR including sample DCRs using transformations. -The following examples show what changes you can apply to your cluster by modifying the ConfigMap file to help control cost. +## Configure pricing tiers -1. Disable stdout logs across all namespaces in the cluster by modifying the following code in the ConfigMap file for the Azure Container insights service that's pulling the metrics: +[Basic Logs in Azure Monitor](../logs/logs-table-plans.md) offer a significant cost discount for ingestion of data in your Log Analytics workspace for data that you occasionally use for debugging and troubleshooting. Tables configured for basic logs offer a significant cost discount for data ingestion in exchange for a cost for log queries meaning that they're ideal for data that you require but that you access infrequently. - ``` - [log_collection_settings] - [log_collection_settings.stdout] - enabled = false - ``` +[ContainerLogV2](container-insights-logs-schema.md) can be configured for basic logs which can give you significant cost savings if you query the data infrequently. Using [transformations](#transformations), you can specify data that should be sent to alternate tables configured for basic logs. See [Data transformations in Container insights](./container-insights-transformations.md) for an example of this strategy. -1. Disable collecting stderr logs from your development namespace. An example is `dev-test`. Continue collecting stderr logs from other namespaces, such as, `prod` and `default`, by modifying the following code in the ConfigMap file: - >[!NOTE] - >The kube-system log collection is disabled by default. The default setting is retained. Adding the `dev-test` namespace to the list of exclusion namespaces is applied to stderr log collection. - ``` - [log_collection_settings.stderr] - enabled = true - exclude_namespaces = ["kube-system", "dev-test"] - ``` - -1. Disable environment variable collection across the cluster by modifying the following code in the ConfigMap file. This modification applies to all containers in every Kubernetes namespace. - - ``` - [log_collection_settings.env_var] - enabled = false - ``` - -1. To clean up jobs that are finished, specify the cleanup policy in your job definition yaml. Following is example Job definition with clean-up policy. For more details, refer to [Kubernetes documentation](https://kubernetes.io/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically). - - ``` - apiVersion: batch/v1 - kind: Job - metadata: - name: pi-with-ttl - spec: - ttlSecondsAfterFinished: 100 - ``` - -After you apply one or more of these changes to your ConfigMaps, apply it to your cluster with the command `kubectl apply -f <config3. map_yaml_file.yaml>`. For example, run the command `kubectl apply -f container-azm-ms-agentconfig.yaml` to open the file in your default editor to modify and then save it. - -### Configure Basic Logs - -You can save on data ingestion costs on ContainerLog in your Log Analytics workspace that you primarily use for debugging, troubleshooting, and auditing as Basic Logs. For more information, including the limitations of Basic Logs, see [Configure Basic Logs in Azure Monitor](../logs/logs-table-plans.md). ContainerLogV2 is the configured version of Basic Logs that Container Insights uses. ContainerLogV2 includes verbose text-based log records. - -You must be on the ContainerLogV2 schema to configure Basic Logs. For more information, see [Enable the ContainerLogV2 schema](container-insights-logs-schema.md). - -### Prometheus metrics scraping - -> [!NOTE] -> This section describes [collection of Prometheus metrics in your Log Analytics workspace](container-insights-prometheus-logs.md). This information does not apply if you're using [Managed Prometheus to scrape your Prometheus metrics](kubernetes-monitoring-enable.md#enable-prometheus-and-grafana). - -If you [collect Prometheus metrics in your Log Analytics workspace](container-insights-prometheus-logs.md), make sure that you limit the number of metrics you collect from your cluster: --- Ensure that scraping frequency is optimally set. The default is 60 seconds. You can increase the frequency to 15 seconds, but you must ensure that the metrics you're scraping are published at that frequency. Otherwise, many duplicate metrics will be scraped and sent to your Log Analytics workspace at intervals that add to data ingestion and retention costs but are of less value.-- Container insights supports exclusion and inclusion lists by metric name. For example, if you're scraping kubedns metrics in your cluster, hundreds of them might get scraped by default. But you're most likely only interested in a subset of the metrics. Confirm that you specified a list of metrics to scrape, or exclude others except for a few to save on data ingestion volume. It's easy to enable scraping and not use many of those metrics, which will only add charges to your Log Analytics bill.-- When you scrape through pod annotations, ensure you filter by namespace so that you exclude scraping of pod metrics from namespaces that you don't use. An example is the `dev-test` namespace.- -## Data collected from Kubernetes clusters - -### Metric data -Container insights includes a predefined set of metrics and inventory items collected that are written as log data in your Log Analytics workspace. All metrics in the following table are collected every one minute. --- -\| Type \| Metrics \| -\|:\|:\| -\| Node metrics \| `cpuUsageNanoCores`<br>`cpuCapacityNanoCores`<br>`cpuAllocatableNanoCores`<br>`memoryRssBytes`<br>`memoryWorkingSetBytes`<br>`memoryCapacityBytes`<br>`memoryAllocatableBytes`<br>`restartTimeEpoch`<br>`used` (disk)<br>`free` (disk)<br>`used_percent` (disk)<br>`io_time` (diskio)<br>`writes` (diskio)<br>`reads` (diskio)<br>`write_bytes` (diskio)<br>`write_time` (diskio)<br>`iops_in_progress` (diskio)<br>`read_bytes` (diskio)<br>`read_time` (diskio)<br>`err_in` (net)<br>`err_out` (net)<br>`bytes_recv` (net)<br>`bytes_sent` (net)<br>`Kubelet_docker_operations` (kubelet) -\| Container metrics \| `cpuUsageNanoCores`<br>`cpuRequestNanoCores`<br>`cpuLimitNanoCores`<br>`memoryRssBytes`<br>`memoryWorkingSetBytes`<br>`memoryRequestBytes`<br>`memoryLimitBytes`<br>`restartTimeEpoch` - -### Cluster inventory - -The following list is the cluster inventory data collected by default: --- KubePodInventory ΓÇô 1 per pod per minute-- KubeNodeInventory ΓÇô 1 per node per minute-- KubeServices ΓÇô 1 per service per minute-- ContainerInventory ΓÇô 1 per container per minute ## Next steps To help you understand what the costs are likely to be based on recent usage patterns from data collected with Container insights, see [Analyze usage in a Log Analytics workspace](../logs/analyze-usage.md).
azure-monitor	Container Insights Data Collection Configmap	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-data-collection-configmap.md	- Title: Configure Container insights data collection using ConfigMap -description: Describes how you can configure other data collection for Container insights using ConfigMap. - Previously updated : 12/19/2023--- -# Configure data collection in Container insights using ConfigMap - -This article describes how to configure data collection in Container insights using ConfigMap. [ConfigMaps](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) are a Kubernetes mechanism that allow you to store non-confidential data such as configuration file or environment variables. - -The ConfigMap is primarily used to configure data collection of the container logs and environment variables of the cluster. You can individually configure the stdout and stderr logs and also enable multiline logging. -l -Specific configuration you can perform with the ConfigMap includes: --- Enable/disable and namespace filtering for stdout and stderr logs-- Enable/disable collection of environment variables for the cluster-- Filter for Normal Kube events-- Select log schema-- Enable/disable multiline logging-- Ignore proxy settings - -> [!IMPORTANT] -> Complete configuration of data collection in Container insights may require editing of both the ConfigMap and the data collection rule (DCR) for the cluster since each method allows configuration of a different set of settings. -> -> See [Configure data collection in Container insights using data collection rule](./container-insights-data-collection-dcr.md) for a list of settings and the process to configure data collection using the DCR. - -## Prerequisites -- ConfigMap is a global list and there can be only one ConfigMap applied to the agent for Container insights. Applying another ConfigMap will overrule the previous ConfigMap collection settings.-- The minimum agent version supported to collect stdout, stderr, and environmental variables from container workloads is ciprod06142019 or later. To verify your agent version, on the Node tab, select a node. On the Properties pane, note the value of the Agent Image Tag property. For more information about the agent versions and what's included in each release, see [Agent release notes](https://github.com/microsoft/Docker-Provider/tree/ci_feature_prod).- -## Configure and deploy ConfigMap - -Use the following procedure to configure and deploy your ConfigMap configuration file to your cluster: - -1. Download the [template ConfigMap YAML file](https://aka.ms/container-azm-ms-agentconfig) and open it in an editor. If you already have a ConfigMap file, then you can use that one. -1. Edit the ConfigMap YAML file with your customizations using the settings described in [Data collection settings](#data-collection-settings) -1. Create a ConfigMap by running the following kubectl command: - - ```azurecli - kubectl apply -f <configmap_yaml_file.yaml> - ``` - - Example: - - ```output - kubectl apply -f container-azm-ms-agentconfig.yaml - ``` -- - The configuration change can take a few minutes to finish before taking effect. Then all Azure Monitor Agent pods in the cluster will restart. The restart is a rolling restart for all Azure Monitor Agent pods, so not all of them restart at the same time. When the restarts are finished, you'll receive a message similar to the following result: - - ```output - configmap "container-azm-ms-agentconfig" created`. - `````` -- -### Data collection settings - -The following table describes the settings you can configure to control data collection. -- -\| Setting \| Data type \| Value \| Description \| -\|:\|:\|:\|:\| -\| `schema-version` \| String (case sensitive) \| v1 \| Used by the agent when parsing this ConfigMap. Currently supported schema-version is v1. Modifying this value isn't supported and will be rejected when the ConfigMap is evaluated. \| -\| `config-version` \| String \| \| Allows you to keep track of this config file's version in your source control system/repository. Maximum allowed characters are 10, and all other characters are truncated. \| -\| [log_collection_settings] \| \| \| \| -\| `[stdout] enabled` \| Boolean \| true<br>false \| Controls whether stdout container log collection is enabled. When set to `true` and no namespaces are excluded for stdout log collection, stdout logs will be collected from all containers across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is `true`. \| -\| `[stdout] exclude_namespaces` \| String \| Comma-separated array \| Array of Kubernetes namespaces for which stdout logs won't be collected. This setting is effective only if `enabled` is set to `true`. If not specified in the ConfigMap, the default value is<br> `["kube-system","gatekeeper-system"]`. \| -\| `[stderr] enabled` \| Boolean \| true<br>false \| Controls whether stderr container log collection is enabled. When set to `true` and no namespaces are excluded for stderr log collection, stderr logs will be collected from all containers across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is `true`. \| -\| `[stderr] exclude_namespaces` \| String \| Comma-separated array \| Array of Kubernetes namespaces for which stderr logs won't be collected. This setting is effective only if `enabled` is set to `true`. If not specified in the ConfigMap, the default value is<br> `["kube-system","gatekeeper-system"]`. \| -\| `[env_var] enabled` \| Boolean \| true<br>false \| This setting controls environment variable collection across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is `true`. If collection of environment variables is globally enabled, you can disable it for a specific container by setting the environment variable `AZMON_COLLECT_ENV` to `False` either with a Dockerfile setting or in the [configuration file for the Pod](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/) under the `env:` section. If collection of environment variables is globally disabled, you can't enable collection for a specific container. The only override that can be applied at the container level is to disable collection when it's already enabled globally. \| -\| `[enrich_container_logs] enabled` \| Boolean \| true<br>false \| Controls container log enrichment to populate the `Name` and `Image` property values for every log record written to the ContainerLog table for all container logs in the cluster. If not specified in the ConfigMap, the default value is `false`. \| -\| `[collect_all_kube_events] enabled` \| Boolean \| true<br>false\| Controls whether Kube events of all types are collected. By default, the Kube events with type Normal aren't collected. When this setting is `true`, the Normal events are no longer filtered, and all events are collected. If not specified in the ConfigMap, the default value is `false`. \| -\| `[schema] containerlog_schema_version` \| String (case sensitive) \| v2<br>v1 \| Sets the log ingestion format. If `v2`, the ContainerLogV2 table is used. If `v1`, the ContainerLog table is used (this table has been deprecated). For clusters enabling container insights using Azure CLI version 2.54.0 or greater, the default setting is `v2`. See [Container insights log schema](./container-insights-logs-schema.md) for details. \| -\| `[enable_multiline_logs] enabled` \| Boolean \| true<br>false \| Controls whether multiline container logs are enabled. See [Multi-line logging in Container Insights](./container-insights-logs-schema.md#multi-line-logging-in-container-insights) for details. If not specified in the ConfigMap, the default value is `false`. This requires the `schema` setting to be `v2`. \| -\| [metric_collection_settings] \| \| \| \| -\| `[collect_kube_system_pv_metrics] enabled` \| Boolean \| true<br>false \| Allows persistent volume (PV) usage metrics to be collected in the kube-system namespace. By default, usage metrics for persistent volumes with persistent volume claims in the kube-system namespace aren't collected. When this setting is set to `true`, PV usage metrics for all namespaces are collected. If not specified in the ConfigMap, the default value is `false`. \| -\| [agent_settings] \| \| \| \| -\| `[proxy_config] ignore_proxy_settings` \| Boolean \| true<br>false \| When `true`, proxy settings are ignored. For both AKS and Arc-enabled Kubernetes environments, if your cluster is configured with forward proxy, then proxy settings are automatically applied and used for the agent. For certain configurations, such as with AMPLS + Proxy, you might want the proxy configuration to be ignored. If not specified in the ConfigMap, the default value is `false`. \| --- -## Verify configuration - -To verify the configuration was successfully applied to a cluster, use the following command to review the logs from an agent pod. - -```azurecli -kubectl logs ama-logs-fdf58 -n kube-system -``` - -If there are configuration errors from the Azure Monitor Agent pods, the output will show errors similar to the following example: - -```output -*************Start Config Processing**************** -config::unsupported/missing config schema version - 'v21' , using defaults -``` - -Errors related to applying configuration changes are also available for review. The following options are available to perform more troubleshooting of configuration changes: --- From an agent pod log using the same `kubectl logs` command.-- From live logs. Live logs show errors similar to the following example:- - ``` - config::error::Exception while parsing config map for log collection/env variable settings: \nparse error on value \"$\" ($end), using defaults, please check config map for errors - ``` --- From the KubeMonAgentEvents table in your Log Analytics workspace. Data is sent every hour with error severity for configuration errors. If there are no errors, the entry in the table will have data with severity info, which reports no errors. The Tags** property contains more information about the pod and container ID on which the error occurred and also the first occurrence, last occurrence, and count in the last hour.-- -## Verify schema version - -Supported config schema versions are available as pod annotation (schema-versions) on the Azure Monitor Agent pod. You can see them with the following kubectl command. - -```bash -kubectl describe pod ama-logs-fdf58 -n=kube-system. -``` - -Output similar to the following example appears with the annotation schema-versions: - -```output - Name: ama-logs-fdf58 - Namespace: kube-system - Node: aks-agentpool-95673144-0/10.240.0.4 - Start Time: Mon, 10 Jun 2019 15:01:03 -0700 - Labels: controller-revision-hash=589cc7785d - dsName=ama-logs-ds - pod-template-generation=1 - Annotations: agentVersion=1.10.0.1 - dockerProviderVersion=5.0.0-0 - schema-versions=v1 -``` - -## Frequently asked questions - -### How do I enable log collection for containers in the kube-system namespace through Helm? - -The log collection from containers in the kube-system namespace is disabled by default. You can enable log collection by setting an environment variable on Azure Monitor Agent. See the [Container insights GitHub page](https://aka.ms/azuremonitor-containers-helm-chart). - -## Next steps --- See [Configure data collection in Container insights using data collection rule](container-insights-data-collection-dcr.md) to configure data collection using DCR instead of ConfigMap.-
azure-monitor	Container Insights Data Collection Configure	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-data-collection-configure.md	+ + Title: Configure Container insights data collection +description: Details on configuring data collection in Azure Monitor Container insights after you enable it on your Kubernetes cluster. + Last updated : 05/14/2024+++ +# Configure log collection in Container insights + +This article provides details on how to configure data collection in [Container insights](./container-insights-overview.md) for your Kubernetes cluster once it's been onboarded. For guidance on enabling Container insights on your cluster, see [Enable monitoring for Kubernetes clusters](./kubernetes-monitoring-enable.md). + +## Configuration methods +There are two methods use to configure and filter data being collected in Container insights. Depending on the setting, you may be able to choose between the two methods or you may be required to use one or the other. The two methods are described in the table below with detailed information in the following sections. + +\| Method \| Description \| +\|:\|:\| +\| [Data collection rule (DCR)](#configure-data-collection-using-dcr) \| [Data collection rules](../essentials/data-collection-rule-overview.md) are sets of instructions supporting data collection using the [Azure Monitor pipeline](../essentials/pipeline-overview.md). A DCR is created when you enable Container insights, and you can modify the settings in this DCR either using the Azure portal or other methods. \| +\| [ConfigMap](#configure-data-collection-using-configmap) \| [ConfigMaps](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) are a Kubernetes mechanism that allows you to store non-confidential data such as a configuration file or environment variables. Container insights looks for a ConfigMap on each cluster with particular settings that define data that it should collect.\| + +## Configure data collection using DCR +The DCR created by Container insights is named MSCI-\<cluster-region\>-\<cluster-name\>. You can [view this DCR](../essentials/data-collection-rule-view.md) along with others in your subscription, and you can edit it using methods described in [Create and edit data collection rules (DCRs) in Azure Monitor](../essentials/data-collection-rule-create-edit.md). While you can directly modify the DCR for particular customizations, you can perform most required configuration using the methods described below. See [Data transformations in Container insights](./container-insights-transformations.md) for details on editing the DCR directly for more advanced configurations. + +> [!IMPORTANT] +> AKS clusters must use either a system-assigned or user-assigned managed identity. If cluster is using a service principal, you must update the cluster to use a [system-assigned managed identity](/azure/aks/use-managed-identity#update-an-existing-aks-cluster-to-use-a-system-assigned-managed-identity) or a [user-assigned managed identity](/azure/aks/use-managed-identity#update-an-existing-cluster-to-use-a-user-assigned-managed-identity). +++ +### [Azure portal](#tab/portal) + +### Configure DCR with Azure portal +Using the Azure portal, you can select from multiple preset configurations for data collection in Container insights. These configurations include different sets of tables and collection frequencies depending on your particular priorities. You can also customize the settings to collect only the data you require. You can use the Azure portal to customize configuration on your existing cluster after Container insights has been enabled, or you can perform this configuration when you enable Container insights on your cluster. + +1. Select the cluster in the Azure portal. +2. Select the Insights option in the Monitoring section of the menu. +3. If Container insights has already been enabled on the cluster, select the Monitoring Settings button. If not, select Configure Azure Monitor and see [Enable monitoring on your Kubernetes cluster with Azure Monitor](container-insights-onboard.md) for details on enabling monitoring. + + :::image type="content" source="media/container-insights-cost-config/monitor-settings-button.png" alt-text="Screenshot of AKS cluster with monitor settings button." lightbox="media/container-insights-cost-config/monitor-settings-button.png" ::: ++ +4. For AKS and Arc-enabled Kubernetes, select Use managed identity if you haven't yet migrated the cluster to [managed identity authentication](../containers/container-insights-onboard.md#authentication). +5. Select one of the cost presets. + + :::image type="content" source="media/container-insights-cost-config/cost-settings-onboarding.png" alt-text="Screenshot that shows the onboarding options." lightbox="media/container-insights-cost-config/cost-settings-onboarding.png" ::: + + \| Cost preset \| Collection frequency \| Namespace filters \| Syslog collection \| Collected data \| + \| \| \| \| \| \| + \| Standard \| 1 m \| None \| Not enabled \| All standard container insights tables \| + \| Cost-optimized \| 5 m \| Excludes kube-system, gatekeeper-system, azure-arc \| Not enabled \| All standard container insights tables \| + \| Syslog \| 1 m \| None \| Enabled by default \| All standard container insights tables \| + \| Logs and Events \| 1 m \| None \| Not enabled \| ContainerLog/ContainerLogV2<br> KubeEvents<br>KubePodInventory \| + +6. If you want to customize the settings, click Edit collection settings. + + :::image type="content" source="media/container-insights-cost-config/advanced-collection-settings.png" alt-text="Screenshot that shows the collection settings options." lightbox="media/container-insights-cost-config/advanced-collection-settings.png" ::: + + \| Name \| Description \| + \|:\|:\| + \| Collection frequency \| Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals The default value is 1m.\| + \| Namespace filtering \| Off: Collects data on all namespaces.<br>Include: Collects only data from the values in the namespaces field.<br>Exclude: Collects data from all namespaces except for the values in the namespaces field.<br><br>Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the _namespaceFilteringMode_. For example, namespaces = ["kube-system", "default"] with an _Include_ setting collects only these two namespaces. With an _Exclude_ setting, the agent collects data from all other namespaces except for _kube-system_ and _default_. \| + \| Collected Data \| Defines which Container insights tables to collect. See below for a description of each grouping. \| + \| Enable ContainerLogV2 \| Boolean flag to enable [ContainerLogV2 schema](./container-insights-logs-schema.md). If set to true, the stdout/stderr Logs are ingested to [ContainerLogV2](container-insights-logs-schema.md) table. If not, the container logs are ingested to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2. \| + \| Enable Syslog collection \| Enables Syslog collection from the cluster. \| + + + The Collected data option allows you to select the tables that are populated for the cluster. The tables are grouped by the most common scenarios. To specify individual tables, you must modify the DCR using another method. + + :::image type="content" source="media/container-insights-cost-config/collected-data-options.png" alt-text="Screenshot that shows the collected data options." lightbox="media/container-insights-cost-config/collected-data-options.png" ::: + + \| Grouping \| Tables \| Notes \| + \| \| \| \| + \| All (Default) \| All standard container insights tables \| Required for enabling the default Container insights visualizations \| + \| Performance \| Perf, InsightsMetrics \| \| + \| Logs and events \| ContainerLog or ContainerLogV2, KubeEvents, KubePodInventory \| Recommended if you have enabled managed Prometheus metrics \| + \| Workloads, Deployments, and HPAs \| InsightsMetrics, KubePodInventory, KubeEvents, ContainerInventory, ContainerNodeInventory, KubeNodeInventory, KubeServices \| \| + \| Persistent Volumes \| InsightsMetrics, KubePVInventory \| \| + +1. Click Configure to save the settings. + +### [CLI](#tab/cli) + +### Configure DCR with Azure portal + +#### Prerequisites + +- Azure CLI minimum version 2.51.0. +- For AKS clusters, [aks-preview](/azure/aks/cluster-configuration) version 0.5.147 or higher +- For Arc enabled Kubernetes and AKS hybrid, [k8s-extension](../../azure-arc/kubernetes/extensions.md#prerequisites) version 1.4.3 or higher + +#### Configuration file + +When you use CLI to configure monitoring for your AKS cluster, you provide the configuration as a JSON file using the following format. See the section below for how to use CLI to apply these settings to different cluster configurations. + +```json +{ + "interval": "1m", + "namespaceFilteringMode": "Include", + "namespaces": ["kube-system"], + "enableContainerLogV2": true, + "streams": ["Microsoft-Perf", "Microsoft-ContainerLogV2"] +} +``` + +Each of the settings in the configuration is described in the following table. + +\| Name \| Description \| +\|:\|:\| +\| `interval` \| Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals The default value is 1m. If the value is outside the allowed range, then it defaults to 1 m. \| +\| `namespaceFilteringMode` \| Include: Collects only data from the values in the namespaces field.<br>Exclude: Collects data from all namespaces except for the values in the namespaces field.<br>Off: Ignores any namespace selections and collect data on all namespaces. +\| `namespaces` \| Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the _namespaceFilteringMode_.<br>For example, namespaces = ["kube-system", "default"] with an _Include_ setting collects only these two namespaces. With an _Exclude_ setting, the agent collects data from all other namespaces except for _kube-system_ and _default_. With an _Off_ setting, the agent collects data from all namespaces including _kube-system_ and _default_. Invalid and unrecognized namespaces are ignored. \| +\| `enableContainerLogV2` \| Boolean flag to enable ContainerLogV2 schema. If set to true, the stdout/stderr Logs are ingested to [ContainerLogV2](container-insights-logs-schema.md) table. If not, the container logs are ingested to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2. \| +\| `streams` \| An array of container insights table streams. See [Stream values in DCR](#stream-values-in-dcr) for a list of the valid streams and their corresponding tables. \| +++ +### AKS cluster + +>[!IMPORTANT] +> In the commands in this section, when deploying on a Windows machine, the dataCollectionSettings field must be escaped. For example, dataCollectionSettings={\"interval\":\"1m\",\"namespaceFilteringMode\": \"Include\", \"namespaces\": [ \"kube-system\"]} instead of dataCollectionSettings='{"interval":"1m","namespaceFilteringMode": "Include", "namespaces": [ "kube-system"]}' ++ +#### New AKS cluster + +Use the following command to create a new AKS cluster with monitoring enabled. This assumes a configuration file named dataCollectionSettings.json. + +```azcli +az aks create -g <clusterResourceGroup> -n <clusterName> --enable-managed-identity --node-count 1 --enable-addons monitoring --data-collection-settings dataCollectionSettings.json --generate-ssh-keys +``` + +#### Existing AKS cluster + +Cluster without the monitoring addon +Use the following command to add monitoring to an existing cluster without Container insights enabled. This assumes a configuration file named dataCollectionSettings.json. + +```azcli +az aks enable-addons -a monitoring -g <clusterResourceGroup> -n <clusterName> --data-collection-settings dataCollectionSettings.json +``` + +Cluster with an existing monitoring addon +Use the following command to add a new configuration to an existing cluster with Container insights enabled. This assumes a configuration file named dataCollectionSettings.json. + +```azcli +# get the configured log analytics workspace resource id +az aks show -g <clusterResourceGroup> -n <clusterName> \| grep -i "logAnalyticsWorkspaceResourceID" + +# disable monitoring +az aks disable-addons -a monitoring -g <clusterResourceGroup> -n <clusterName> + +# enable monitoring with data collection settings +az aks enable-addons -a monitoring -g <clusterResourceGroup> -n <clusterName> --workspace-resource-id <logAnalyticsWorkspaceResourceId> --data-collection-settings dataCollectionSettings.json +``` + +### Arc-enabled Kubernetes cluster +Use the following command to add monitoring to an existing Arc-enabled Kubernetes cluster. +```azcli +az k8s-extension create --name azuremonitor-containers --cluster-name <cluster-name> --resource-group <resource-group> --cluster-type connectedClusters --extension-type Microsoft.AzureMonitor.Containers --configuration-settings amalogs.useAADAuth=true dataCollectionSettings='{"interval":"1m","namespaceFilteringMode": "Include", "namespaces": [ "kube-system"],"enableContainerLogV2": true,"streams": ["<streams to be collected>"]}' +``` + +### AKS hybrid cluster +Use the following command to add monitoring to an existing AKS hybrid cluster. + +```azcli +az k8s-extension create --name azuremonitor-containers --cluster-name <cluster-name> --resource-group <resource-group> --cluster-type provisionedclusters --cluster-resource-provider "microsoft.hybridcontainerservice" --extension-type Microsoft.AzureMonitor.Containers --configuration-settings amalogs.useAADAuth=true dataCollectionSettings='{"interval":"1m","namespaceFilteringMode":"Include", "namespaces": ["kube-system"],"enableContainerLogV2": true,"streams": ["<streams to be collected>"]}' +``` ++ +### [ARM](#tab/arm) + +### Configure DCR with ARM templates + +The following template and parameter files are available for different cluster configurations. + +AKS cluster +- Template: https://aka.ms/aks-enable-monitoring-costopt-onboarding-template-file +- Parameter: https://aka.ms/aks-enable-monitoring-costopt-onboarding-template-parameter-file + +Arc-enabled Kubernetes +- Template: https://aka.ms/arc-k8s-enable-monitoring-costopt-onboarding-template-file +- Parameter: https://aka.ms/arc-k8s-enable-monitoring-costopt-onboarding-template-parameter-file + +AKS hybrid cluster +- Template: https://aka.ms/existingClusterOnboarding.json +- Parameter: https://aka.ms/existingClusterParam.json + +The following table describes the parameters you need to provide values for in each of the parameter files. + +\| Name \| Description \| +\|:\|:\| +\| `aksResourceId` \| Resource ID for the cluster.\| +\| `aksResourceLocation` \| Location of the cluster.\| +\| `workspaceRegion` \| Location of the Log Analytics workspace. \| +\| `enableContainerLogV2` \| Boolean flag to enable ContainerLogV2 schema. If set to true, the stdout/stderr Logs are ingested to [ContainerLogV2](container-insights-logs-schema.md) table. If not, the container logs are ingested to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2. \| +\| `enableSyslog` \| Specifies whether Syslog collection should be enabled. \| +\| `syslogLevels` \| If Syslog collection is enabled, specifies the log levels to collect. \| +\| `dataCollectionInterval` \| Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals The default value is 1m. If the value is outside the allowed range, then it defaults to 1 m. \| +\| `namespaceFilteringModeForDataCollection` \| Include: Collects only data from the values in the namespaces field.<br>Exclude: Collects data from all namespaces except for the values in the namespaces field.<br>Off: Ignores any namespace selections and collect data on all namespaces. +\| `namespacesForDataCollection` \| Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the _namespaceFilteringMode_.<br>For example, namespaces = ["kube-system", "default"] with an _Include_ setting collects only these two namespaces. With an _Exclude_ setting, the agent collects data from all other namespaces except for _kube-system_ and _default_. With an _Off_ setting, the agent collects data from all namespaces including _kube-system_ and _default_. Invalid and unrecognized namespaces are ignored. \| +\| `streams` \| An array of container insights table streams. See [Stream values in DCR](#stream-values-in-dcr) for a list of the valid streams and their corresponding tables. \| +\| `useAzureMonitorPrivateLinkScope` \| Specifies whether to use private link for the cluster connection to Azure Monitor. \| +\| `azureMonitorPrivateLinkScopeResourceId` \| If private link is used, resource ID of the private link scope. \| +++ +### Applicable tables and metrics for DCR +The settings for collection frequency and namespace filtering in the DCR don't apply to all Container insights data. The following tables list the tables in the Log Analytics workspace used by Container insights and the metrics it collects along with the settings that apply to each. + +\| Table name \| Interval? \| Namespaces? \| Remarks \| +\|:\|::\|::\|:\| +\| ContainerInventory \| Yes \| Yes \| \| +\| ContainerNodeInventory \| Yes \| No \| Data collection setting for namespaces isn't applicable since Kubernetes Node isn't a namespace scoped resource \| +\| KubeNodeInventory \| Yes \| No \| Data collection setting for namespaces isn't applicable Kubernetes Node isn't a namespace scoped resource \| +\| KubePodInventory \| Yes \| Yes \|\| +\| KubePVInventory \| Yes \| Yes \| \| +\| KubeServices \| Yes \| Yes \| \| +\| KubeEvents \| No \| Yes \| Data collection setting for interval isn't applicable for the Kubernetes Events \| +\| Perf \| Yes \| Yes \| Data collection setting for namespaces isn't applicable for the Kubernetes Node related metrics since the Kubernetes Node isn't a namespace scoped object. \| +\| InsightsMetrics\| Yes \| Yes \| Data collection settings are only applicable for the metrics collecting the following namespaces: container.azm.ms/kubestate, container.azm.ms/pv and container.azm.ms/gpu \| ++ +\| Metric namespace \| Interval? \| Namespaces? \| Remarks \| +\|:\|::\|::\|:\| +\| Insights.container/nodes\| Yes \| No \| Node isn't a namespace scoped resource \| +\|Insights.container/pods \| Yes \| Yes\| \| +\| Insights.container/containers \| Yes \| Yes \| \| +\| Insights.container/persistentvolumes \| Yes \| Yes \| \| +++ +### Stream values in DCR +When you specify the tables to collect using CLI or ARM, you specify a stream name that corresponds to a particular table in the Log Analytics workspace. The following table lists the stream name for each table. + +> [!NOTE] +> If you're familiar with the [structure of a data collection rule](../essentials/data-collection-rule-structure.md), the stream names in this table are specified in the [dataFlows](../essentials/data-collection-rule-structure.md#dataflows) section of the DCR. + +\| Stream \| Container insights table \| +\| \| \| +\| Microsoft-ContainerInventory \| ContainerInventory \| +\| Microsoft-ContainerLog \| ContainerLog \| +\| Microsoft-ContainerLogV2 \| ContainerLogV2 \| +\| Microsoft-ContainerNodeInventory \| ContainerNodeInventory \| +\| Microsoft-InsightsMetrics \| InsightsMetrics \| +\| Microsoft-KubeEvents \| KubeEvents \| +\| Microsoft-KubeMonAgentEvents \| KubeMonAgentEvents \| +\| Microsoft-KubeNodeInventory \| KubeNodeInventory \| +\| Microsoft-KubePodInventory \| KubePodInventory \| +\| Microsoft-KubePVInventory \| KubePVInventory \| +\| Microsoft-KubeServices \| KubeServices \| +\| Microsoft-Perf \| Perf \| + +## Share DCR with multiple clusters +When you enable Container insights on a Kubernetes cluster, a new DCR is created for that cluster, and the DCR for each cluster can be modified independently. If you have multiple clusters with custom monitoring configurations, you may want to share a single DCR with multiple clusters. You can then make changes to a single DCR that are automatically implemented for any clusters associated with it. + +A DCR is associated with a cluster with a [data collection rule associates (DCRA)](../essentials/data-collection-rule-overview.md#data-collection-rule-associations-dcra). Use the [preview DCR experience](../essentials/data-collection-rule-view.md#preview-dcr-experience) to view and remove existing DCR associations for each cluster. You can then use this feature to add an association to a single DCR for multiple clusters. + +## Configure data collection using ConfigMap + +[ConfigMaps](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) are a Kubernetes mechanism that allow you to store non-confidential data such as a configuration file or environment variables. Container insights looks for a ConfigMap on each cluster with particular settings that define data that it should collect. + +> [!IMPORTANT] +> ConfigMap is a global list and there can be only one ConfigMap applied to the agent for Container insights. Applying another ConfigMap will overrule the previous ConfigMap collection settings. + +### Prerequisites +- The minimum agent version supported to collect stdout, stderr, and environmental variables from container workloads is ciprod06142019 or later. + +### Configure and deploy ConfigMap + +Use the following procedure to configure and deploy your ConfigMap configuration file to your cluster: + +1. If you don't already have a ConfigMap for Container insights, download the [template ConfigMap YAML file](https://aka.ms/container-azm-ms-agentconfig) and open it in an editor. + +1. Edit the ConfigMap YAML file with your customizations. The template includes all valid settings with descriptions. To enable a setting, remove the comment character (#) and set its value. + +1. Create a ConfigMap by running the following kubectl command: + + ```azurecli + kubectl apply -f <configmap_yaml_file.yaml> + + # Example: + kubectl apply -f container-azm-ms-agentconfig.yaml + ``` ++ + The configuration change can take a few minutes to finish before taking effect. Then all Azure Monitor Agent pods in the cluster will restart. The restart is a rolling restart for all Azure Monitor Agent pods, so not all of them restart at the same time. When the restarts are finished, you'll receive a message similar to the following result: + + ```output + configmap "container-azm-ms-agentconfig" created`. + ``` + +### Verify configuration + +To verify the configuration was successfully applied to a cluster, use the following command to review the logs from an agent pod. + +```azurecli +kubectl logs ama-logs-fdf58 -n kube-system +``` + +If there are configuration errors from the Azure Monitor Agent pods, the output will show errors similar to the following: + +```output +*************Start Config Processing**************** +config::unsupported/missing config schema version - 'v21' , using defaults +``` + +Use the following options to perform more troubleshooting of configuration changes: + +- Use the same `kubectl logs` command from an agent pod. +- Review live logs for errors similar to the following: + + ``` + config::error::Exception while parsing config map for log collection/env variable settings: \nparse error on value \"$\" ($end), using defaults, please check config map for errors + ``` + +- Data is sent to the `KubeMonAgentEvents` table in your Log Analytics workspace every hour with error severity for configuration errors. If there are no errors, the entry in the table will have data with severity info, which reports no errors. The `Tags` column contains more information about the pod and container ID on which the error occurred and also the first occurrence, last occurrence, and count in the last hour. + +### Verify schema version + +Supported config schema versions are available as pod annotation (schema-versions) on the Azure Monitor Agent pod. You can see them with the following kubectl command. + +```bash +kubectl describe pod ama-logs-fdf58 -n=kube-system. +``` ++ +## ConfigMap settings + +The following table describes the settings you can configure to control data collection with ConfigMap. ++ +\| Setting \| Data type \| Value \| Description \| +\|:\|:\|:\|:\| +\| `schema-version` \| String (case sensitive) \| v1 \| Used by the agent when parsing this ConfigMap. Currently supported schema-version is v1. Modifying this value isn't supported and will be rejected when the ConfigMap is evaluated. \| +\| `config-version` \| String \| \| Allows you to keep track of this config file's version in your source control system/repository. Maximum allowed characters are 10, and all other characters are truncated. \| +\| [log_collection_settings] \| \| \| \| +\| `[stdout]`<br>`enabled` \| Boolean \| true<br>false \| Controls whether stdout container log collection is enabled. When set to `true` and no namespaces are excluded for stdout log collection, stdout logs will be collected from all containers across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is `true`. \| +\| `[stdout]`<br>`exclude_namespaces` \| String \| Comma-separated array \| Array of Kubernetes namespaces for which stdout logs won't be collected. This setting is effective only if `enabled` is set to `true`. If not specified in the ConfigMap, the default value is<br> `["kube-system","gatekeeper-system"]`. \| +\| `[stderr]`<br>`enabled` \| Boolean \| true<br>false \| Controls whether stderr container log collection is enabled. When set to `true` and no namespaces are excluded for stderr log collection, stderr logs will be collected from all containers across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is `true`. \| +\| `[stderr]`<br>`exclude_namespaces` \| String \| Comma-separated array \| Array of Kubernetes namespaces for which stderr logs won't be collected. This setting is effective only if `enabled` is set to `true`. If not specified in the ConfigMap, the default value is<br> `["kube-system","gatekeeper-system"]`. \| +\| `[env_var]`<br>`enabled` \| Boolean \| true<br>false \| Controls environment variable collection across all pods and nodes in the cluster. If not specified in the ConfigMap, the default value is `true`. \| +\| `[enrich_container_logs]`<br>`enabled` \| Boolean \| true<br>false \| Controls container log enrichment to populate the `Name` and `Image` property values for every log record written to the ContainerLog table for all container logs in the cluster. If not specified in the ConfigMap, the default value is `false`. \| +\| `[collect_all_kube_events]`<br>`enabled` \| Boolean \| true<br>false\| Controls whether Kube events of all types are collected. By default, the Kube events with type Normal aren't collected. When this setting is `true`, the Normal events are no longer filtered, and all events are collected. If not specified in the ConfigMap, the default value is `false`. \| +\| `[schema]`<br>`containerlog_schema_version` \| String (case sensitive) \| v2<br>v1 \| Sets the log ingestion format. If `v2`, the ContainerLogV2 table is used. If `v1`, the ContainerLog table is used (this table has been deprecated). For clusters enabling container insights using Azure CLI version 2.54.0 or greater, the default setting is `v2`. See [Container insights log schema](./container-insights-logs-schema.md) for details. \| +\| `[enable_multiline_logs]`<br>`enabled` \| Boolean \| true<br>false \| Controls whether multiline container logs are enabled. See [Multi-line logging in Container Insights](./container-insights-logs-schema.md#multi-line-logging) for details. If not specified in the ConfigMap, the default value is `false`. This requires the `schema` setting to be `v2`. \| +\| `[metadata_collection]`<br>`enabled` \| Boolean \| true<br>false \| Controls whether metadata is collected in the `KubernetesMetadata` column of the `ContainerLogV2` table. \| +\| `[metadata_collection]`<br>`include_fields` \| String \| Comma-separated array \| List of metadata fields to include. If the setting isn't used then all fields are collected. Valid values are `["podLabels","podAnnotations","podUid","image","imageID","imageRepo","imageTag"]` \| +\| [metric_collection_settings] \| \| \| \| +\| `[collect_kube_system_pv_metrics]`<br>`enabled` \| Boolean \| true<br>false \| Allows persistent volume (PV) usage metrics to be collected in the kube-system namespace. By default, usage metrics for persistent volumes with persistent volume claims in the kube-system namespace aren't collected. When this setting is set to `true`, PV usage metrics for all namespaces are collected. If not specified in the ConfigMap, the default value is `false`. \| +\| [agent_settings]** \| \| \| \| +\| `[proxy_config]`<br>`ignore_proxy_settings` \| Boolean \| true<br>false \| When `true`, proxy settings are ignored. For both AKS and Arc-enabled Kubernetes environments, if your cluster is configured with forward proxy, then proxy settings are automatically applied and used for the agent. For certain configurations, such as with AMPLS + Proxy, you might want the proxy configuration to be ignored. If not specified in the ConfigMap, the default value is `false`. \| +++ +## Next steps + +- See [Filter log collection in Container insights](./container-insights-data-collection-filter.md) for details on saving costs by configuring Container insights to filter data that you don't require. +
azure-monitor	Container Insights Data Collection Dcr	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-data-collection-dcr.md	- Title: Configure data collection and cost optimization in Container insights using data collection rule -description: Describes how you can configure cost optimization and other data collection for Container insights using a data collection rule. -- Previously updated : 12/19/2023--- -# Configure data collection and cost optimization in Container insights using data collection rule - -This article describes how to configure data collection in Container insights using the [data collection rule (DCR)](../essentials/data-collection-rule-overview.md) for your Kubernetes cluster. This includes preset configurations for optimizing your costs. A DCR is created when you onboard a cluster to Container insights. This DCR is used by the containerized agent to define data collection for the cluster. - -The DCR is primarily used to configure data collection of performance and inventory data and to configure cost optimization. - -Specific configuration you can perform with the DCR includes: --- Enable/disable collection and namespace filtering for performance and inventory data.-- Define collection interval for performance and inventory data-- Enable/disable Syslog collection-- Select log schema- -> [!IMPORTANT] -> Complete configuration of data collection in Container insights may require editing of both the DCR and the ConfigMap for the cluster since each method allows configuration of a different set of settings. -> -> See [Configure data collection in Container insights using ConfigMap](./container-insights-data-collection-configmap.md) for a list of settings and the process to configure data collection using ConfigMap. -> Customers should not delete or manually edit their DCR resource. - -## Prerequisites --- AKS clusters must use either a system-assigned or user-assigned managed identity. If cluster is using a service principal, you must update the cluster to use a [system-assigned managed identity](/azure/aks/use-managed-identity#update-an-existing-aks-cluster-to-use-a-system-assigned-managed-identity) or a [user-assigned managed identity](/azure/aks/use-managed-identity#update-an-existing-cluster-to-use-a-user-assigned-managed-identity).--- -## Configure data collection -The DCR that gets created when you enable Container insights is named MSCI-\<cluster-region\>-\<cluster-name\>. You can view it in the Azure portal by selecting the Data Collection Rules option in the Monitor menu in the Azure portal. Rather than directly modifying the DCR, you should use one of the methods described below to configure data collection. See [Data collection parameters](#data-collection-parameters) for details about the different available settings used by each method. - -> [!WARNING] -> The default Container insights experience depends on all the existing data streams. Removing one or more of the default streams makes the Container insights experience unavailable, and you need to use other tools such as Grafana dashboards and log queries to analyze collected data. - -## [Azure portal](#tab/portal) -You can use the Azure portal to enable cost optimization on your existing cluster after Container insights has been enabled, or you can enable Container insights on the cluster along with cost optimization. - -1. Select the cluster in the Azure portal. -2. Select the Insights option in the Monitoring section of the menu. -3. If Container insights has already been enabled on the cluster, select the Monitoring Settings button. If not, select Configure Azure Monitor and see [Enable monitoring on your Kubernetes cluster with Azure Monitor](container-insights-onboard.md) for details on enabling monitoring. - - :::image type="content" source="media/container-insights-cost-config/monitor-settings-button.png" alt-text="Screenshot of AKS cluster with monitor settings button." lightbox="media/container-insights-cost-config/monitor-settings-button.png" ::: -- -4. For AKS and Arc-enabled Kubernetes, select Use managed identity if you haven't yet migrated the cluster to [managed identity authentication](../containers/container-insights-onboard.md#authentication). -5. Select one of the cost presets described in [Cost presets](#cost-presets). - - :::image type="content" source="media/container-insights-cost-config/cost-settings-onboarding.png" alt-text="Screenshot that shows the onboarding options." lightbox="media/container-insights-cost-config/cost-settings-onboarding.png" ::: - -1. If you want to customize the settings, click Edit collection settings. See [Data collection parameters](#data-collection-parameters) for details on each setting. For Collected data, see [Collected data](#collected-data) below. - - :::image type="content" source="media/container-insights-cost-config/advanced-collection-settings.png" alt-text="Screenshot that shows the collection settings options." lightbox="media/container-insights-cost-config/advanced-collection-settings.png" ::: - -1. Click Configure to save the settings. -- -### Cost presets -When you use the Azure portal to configure cost optimization, you can select from the following preset configurations. You can select one of these or provide your own customized settings. By default, Container insights uses the Standard preset. - -\| Cost preset \| Collection frequency \| Namespace filters \| Syslog collection \| Collected data \| -\| \| \| \| \| \| -\| Standard \| 1 m \| None \| Not enabled \| All standard container insights tables \| -\| Cost-optimized \| 5 m \| Excludes kube-system, gatekeeper-system, azure-arc \| Not enabled \| All standard container insights tables \| -\| Syslog \| 1 m \| None \| Enabled by default \| All standard container insights tables \| -\| Logs and Events \| 1 m \| None \| Not enabled \| ContainerLog/ContainerLogV2<br> KubeEvents<br>KubePodInventory \| - -### Collected data -The Collected data option allows you to select the tables that are populated for the cluster. This is the equivalent of the `streams` parameter when performing the configuration with CLI or ARM. If you select any option other than All (Default), the Container insights experience becomes unavailable, and you must use Grafana or other methods to analyze collected data. -- -\| Grouping \| Tables \| Notes \| -\| \| \| \| -\| All (Default) \| All standard container insights tables \| Required for enabling the default Container insights visualizations \| -\| Performance \| Perf, InsightsMetrics \| \| -\| Logs and events \| ContainerLog or ContainerLogV2, KubeEvents, KubePodInventory \| Recommended if you have enabled managed Prometheus metrics \| -\| Workloads, Deployments, and HPAs \| InsightsMetrics, KubePodInventory, KubeEvents, ContainerInventory, ContainerNodeInventory, KubeNodeInventory, KubeServices \| \| -\| Persistent Volumes \| InsightsMetrics, KubePVInventory \| \| --- -## [CLI](#tab/cli) - -> [!NOTE] -> Minimum version required for Azure CLI is 2.51.0. -``` -- For AKS clusters, [aks-preview](../../aks/cluster-configuration.md) version 0.5.147 or higher-- For Arc enabled Kubernetes and AKS hybrid, [k8s-extension](../../azure-arc/kubernetes/extensions.md#prerequisites) version 1.4.3 or higher -```## AKS cluster - -When you use CLI to configure monitoring for your AKS cluster, you provide the configuration as a JSON file using the following format. Each of these settings is described in [Data collection parameters](#data-collection-parameters). - -```json -{ - "interval": "1m", - "namespaceFilteringMode": "Include", - "namespaces": ["kube-system"], - "enableContainerLogV2": true, - "streams": ["Microsoft-Perf", "Microsoft-ContainerLogV2"] -} -``` - -### New AKS cluster - -Use the following command to create a new AKS cluster with monitoring enabled. This assumes a configuration file named dataCollectionSettings.json. - -```azcli -az aks create -g <clusterResourceGroup> -n <clusterName> --enable-managed-identity --node-count 1 --enable-addons monitoring --data-collection-settings dataCollectionSettings.json --generate-ssh-keys -``` - -### Existing AKS Cluster - -Cluster without the monitoring addon -Use the following command to add monitoring to an existing cluster without Container insights enabled. This assumes a configuration file named dataCollectionSettings.json. - -```azcli -az aks enable-addons -a monitoring -g <clusterResourceGroup> -n <clusterName> --data-collection-settings dataCollectionSettings.json -``` - -Cluster with an existing monitoring addon -Use the following command to add a new configuration to an existing cluster with Container insights enabled. This assumes a configuration file named dataCollectionSettings.json. - -```azcli -# get the configured log analytics workspace resource id -az aks show -g <clusterResourceGroup> -n <clusterName> \| grep -i "logAnalyticsWorkspaceResourceID" - -# disable monitoring -az aks disable-addons -a monitoring -g <clusterResourceGroup> -n <clusterName> - -# enable monitoring with data collection settings -az aks enable-addons -a monitoring -g <clusterResourceGroup> -n <clusterName> --workspace-resource-id <logAnalyticsWorkspaceResourceId> --data-collection-settings dataCollectionSettings.json -``` - -## Arc-enabled Kubernetes cluster -Use the following command to add monitoring to an existing Arc-enabled Kubernetes cluster. See [Data collection parameters](#data-collection-parameters) for definitions of the available settings. - -```azcli -az k8s-extension create --name azuremonitor-containers --cluster-name <cluster-name> --resource-group <resource-group> --cluster-type connectedClusters --extension-type Microsoft.AzureMonitor.Containers --configuration-settings amalogs.useAADAuth=true dataCollectionSettings='{"interval":"1m","namespaceFilteringMode": "Include", "namespaces": [ "kube-system"],"enableContainerLogV2": true,"streams": ["<streams to be collected>"]}' -``` - ->[!NOTE] -> When deploying on a Windows machine, the dataCollectionSettings field must be escaped. For example, dataCollectionSettings={\"interval\":\"1m\",\"namespaceFilteringMode\": \"Include\", \"namespaces\": [ \"kube-system\"]} instead of dataCollectionSettings='{"interval":"1m","namespaceFilteringMode": "Include", "namespaces": [ "kube-system"]}' - -## AKS hybrid Cluster -Use the following command to add monitoring to an existing AKS hybrid cluster. See [Data collection parameters](#data-collection-parameters) for definitions of the available settings. - -```azcli -az k8s-extension create --name azuremonitor-containers --cluster-name <cluster-name> --resource-group <resource-group> --cluster-type provisionedclusters --cluster-resource-provider "microsoft.hybridcontainerservice" --extension-type Microsoft.AzureMonitor.Containers --configuration-settings amalogs.useAADAuth=true dataCollectionSettings='{"interval":"1m","namespaceFilteringMode":"Include", "namespaces": ["kube-system"],"enableContainerLogV2": true,"streams": ["<streams to be collected>"]}' -``` - ->[!NOTE] -> When deploying on a Windows machine, the dataCollectionSettings field must be escaped. For example, dataCollectionSettings={\"interval\":\"1m\",\"namespaceFilteringMode\": \"Include\", \"namespaces\": [ \"kube-system\"]} instead of dataCollectionSettings='{"interval":"1m","namespaceFilteringMode": "Include", "namespaces": [ "kube-system"]}' ---- -## [ARM](#tab/arm) -- -1. Download the Azure Resource Manager template and parameter files using the following commands. See below for the template and parameter files for each cluster configuration. - - ```bash - curl -L <template file> -o existingClusterOnboarding.json - curl -L <parameter file> -o existingClusterParam.json - ``` - - AKS cluster - - Template: https://aka.ms/aks-enable-monitoring-costopt-onboarding-template-file - - Parameter: https://aka.ms/aks-enable-monitoring-costopt-onboarding-template-parameter-file - - Arc-enabled Kubernetes - - Template: https://aka.ms/arc-k8s-enable-monitoring-costopt-onboarding-template-file - - Parameter: https://aka.ms/arc-k8s-enable-monitoring-costopt-onboarding-template-parameter-file - - AKS hybrid cluster - - Template: https://aka.ms/existingClusterOnboarding.json - - Parameter: https://aka.ms/existingClusterParam.json - -1. Edit the values in the parameter file. See [Data collection parameters](#data-collection-parameters) for details on each setting. See below for settings unique to each cluster configuration. - - AKS cluster<br> - - For _aksResourceId_ and _aksResourceLocation_, use the values on the AKS Overview page for the AKS cluster. - - Arc-enabled Kubernetes - - For _clusterResourceId_ and _clusterResourceLocation_, use the values on the Overview page for the AKS hybrid cluster. - - AKS hybrid cluster - - For _clusterResourceId_ and _clusterRegion_, use the values on the Overview page for the Arc enabled Kubernetes cluster. - -- -1. Deploy the ARM template with the following commands: - - ```azcli - az login - az account set --subscription"Cluster Subscription Name" - az deployment group create --resource-group <ClusterResourceGroupName> --template-file ./existingClusterOnboarding.json --parameters @./existingClusterParam.json - ``` ------ -## Data collection parameters - -The following table describes the supported data collection settings and the name used for each for different onboarding options. -- -\| Name \| Description \| -\|:\|:\| -\| Collection frequency<br>CLI: `interval`<br>ARM: `dataCollectionInterval` \| Determines how often the agent collects data. Valid values are 1m - 30m in 1m intervals The default value is 1m. If the value is outside the allowed range, then it defaults to 1 m. \| -\| Namespace filtering<br>CLI: `namespaceFilteringMode`<br>ARM: `namespaceFilteringModeForDataCollection` \| Include: Collects only data from the values in the namespaces field.<br>Exclude: Collects data from all namespaces except for the values in the namespaces field.<br>Off: Ignores any namespace selections and collect data on all namespaces. -\| Namespace filtering<br>CLI: `namespaces`<br>ARM: `namespacesForDataCollection` \| Array of comma separated Kubernetes namespaces to collect inventory and perf data based on the _namespaceFilteringMode_.<br>For example, namespaces = ["kube-system", "default"] with an _Include_ setting collects only these two namespaces. With an _Exclude_ setting, the agent collects data from all other namespaces except for _kube-system_ and _default_. With an _Off_ setting, the agent collects data from all namespaces including _kube-system_ and _default_. Invalid and unrecognized namespaces are ignored. \| -\| Enable ContainerLogV2<br>CLI: `enableContainerLogV2`<br>ARM: `enableContainerLogV2` \| Boolean flag to enable ContainerLogV2 schema. If set to true, the stdout/stderr Logs are ingested to [ContainerLogV2](container-insights-logs-schema.md) table. If not, the container logs are ingested to ContainerLog table, unless otherwise specified in the ConfigMap. When specifying the individual streams, you must include the corresponding table for ContainerLog or ContainerLogV2. \| -\| Collected Data<br>CLI: `streams`<br>ARM: `streams` \| An array of container insights table streams. See the supported streams above to table mapping. \| - -## Applicable tables and metrics -The settings for collection frequency and namespace filtering don't apply to all Container insights data. The following tables list the tables in the Log Analytics workspace used by Container insights and the metrics it collects along with the settings that apply to each. - ->[!NOTE] ->This feature configures settings for all container insights tables except for ContainerLog and ContainerLogV2. To configure settings for these tables, update the ConfigMap described in [agent data collection settings](../containers/container-insights-data-collection-configmap.md). -- -\| Table name \| Interval? \| Namespaces? \| Remarks \| -\|:\|::\|::\|:\| -\| ContainerInventory \| Yes \| Yes \| \| -\| ContainerNodeInventory \| Yes \| No \| Data collection setting for namespaces isn't applicable since Kubernetes Node isn't a namespace scoped resource \| -\| KubeNodeInventory \| Yes \| No \| Data collection setting for namespaces isn't applicable Kubernetes Node isn't a namespace scoped resource \| -\| KubePodInventory \| Yes \| Yes \|\| -\| KubePVInventory \| Yes \| Yes \| \| -\| KubeServices \| Yes \| Yes \| \| -\| KubeEvents \| No \| Yes \| Data collection setting for interval isn't applicable for the Kubernetes Events \| -\| Perf \| Yes \| Yes \| Data collection setting for namespaces isn't applicable for the Kubernetes Node related metrics since the Kubernetes Node isn't a namespace scoped object. \| -\| InsightsMetrics\| Yes \| Yes \| Data collection settings are only applicable for the metrics collecting the following namespaces: container.azm.ms/kubestate, container.azm.ms/pv and container.azm.ms/gpu \| -- -\| Metric namespace \| Interval? \| Namespaces? \| Remarks \| -\|:\|::\|::\|:\| -\| Insights.container/nodes\| Yes \| No \| Node isn't a namespace scoped resource \| -\|Insights.container/pods \| Yes \| Yes\| \| -\| Insights.container/containers \| Yes \| Yes \| \| -\| Insights.container/persistentvolumes \| Yes \| Yes \| \| --- -## Stream values -When you specify the tables to collect using CLI or ARM, you specify a stream name that corresponds to a particular table in the Log Analytics workspace. The following table lists the stream name for each table. - -> [!NOTE] -> If you're familiar with the [structure of a data collection rule](../essentials/data-collection-rule-structure.md), the stream names in this table are specified in the [dataFlows](../essentials/data-collection-rule-structure.md#dataflows) section of the DCR. - -\| Stream \| Container insights table \| -\| \| \| -\| Microsoft-ContainerInventory \| ContainerInventory \| -\| Microsoft-ContainerLog \| ContainerLog \| -\| Microsoft-ContainerLogV2 \| ContainerLogV2 \| -\| Microsoft-ContainerNodeInventory \| ContainerNodeInventory \| -\| Microsoft-InsightsMetrics \| InsightsMetrics \| -\| Microsoft-KubeEvents \| KubeEvents \| -\| Microsoft-KubeMonAgentEvents \| KubeMonAgentEvents \| -\| Microsoft-KubeNodeInventory \| KubeNodeInventory \| -\| Microsoft-KubePodInventory \| KubePodInventory \| -\| Microsoft-KubePVInventory \| KubePVInventory \| -\| Microsoft-KubeServices \| KubeServices \| -\| Microsoft-Perf \| Perf \| -- -## Impact on visualizations and alerts - -If you're currently using the above tables for other custom alerts or charts, then modifying your data collection settings might degrade those experiences. If you're excluding namespaces or reducing data collection frequency, review your existing alerts, dashboards, and workbooks using this data. - -To scan for alerts that reference these tables, run the following Azure Resource Graph query: - -```Kusto -resources -\| where type in~ ('microsoft.insights/scheduledqueryrules') and ['kind'] !in~ ('LogToMetric') -\| extend severity = strcat("Sev", properties["severity"]) -\| extend enabled = tobool(properties["enabled"]) -\| where enabled in~ ('true') -\| where tolower(properties["targetResourceTypes"]) matches regex 'microsoft.operationalinsights/workspaces($\|/.)?' or tolower(properties["targetResourceType"]) matches regex 'microsoft.operationalinsights/workspaces($\|/.)?' or tolower(properties["scopes"]) matches regex 'providers/microsoft.operationalinsights/workspaces($\|/.*)?' -\| where properties contains "Perf" or properties contains "InsightsMetrics" or properties contains "ContainerInventory" or properties contains "ContainerNodeInventory" or properties contains "KubeNodeInventory" or properties contains"KubePodInventory" or properties contains "KubePVInventory" or properties contains "KubeServices" or properties contains "KubeEvents" -\| project id,name,type,properties,enabled,severity,subscriptionId -\| order by tolower(name) asc -``` --- -## Next steps --- See [Configure data collection in Container insights using ConfigMap](container-insights-data-collection-configmap.md) to configure data collection using ConfigMap instead of the DCR.
azure-monitor	Container Insights Data Collection Filter	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-data-collection-filter.md	+ + Title: Filter log collection in Container insights +description: Options for filtering Container insights data that you don't require. + Last updated : 05/14/2024+++ +# Filter log collection in Container insights + +This article describes the different filtering options available in [Container insights](./container-insights-overview.md). Kubernetes clusters generate a large amount of data that's collected by Container insights. Since you're charged for the ingestion and retention of this data, you can significantly reduce your monitoring costs by filtering out data that you don't need. + +> [!IMPORTANT] +> This article describes different filtering options that require you to modify the DCR or ConfigMap for a monitored cluster. See [Configure log collection in Container insights](container-insights-data-collection-configure.md) for details on performing this configuration. + +## Filter container logs + +Container logs are stderr and stdout logs generated by containers in your Kubernetes cluster. These logs are stored in the [ContainerLogV2 table](./container-insights-logs-schema.md) in your Log Analytics workspace. By default all container logs are collected, but you can filter out logs from specific namespaces or disable collection of container logs entirely. + +Using the [Data collection rule (DCR)](./container-insights-data-collection-configure.md#configure-data-collection-using-dcr), you can enable or disable stdout and stderr logs and filter specific namespaces from each. Settings for Container logs and namespace filtering are included in the cost presets configured in the Azure portal, and you can set these values individually using the other DCR configuration methods. ++ +Using [ConfigMap](./container-insights-data-collection-configure.md#configure-data-collection-using-configmap), you can configure the collection of `stderr` and `stdout` logs separately for the clustery, so you can choose to enable one and not the other. + +The following example shows the ConfigMap settings to collect stdout and stderr excluding the `kube-system` and `gatekeeper-system` namespaces. + +```yml +[log_collection_settings] + [log_collection_settings.stdout] + enabled = true + exclude_namespaces = ["kube-system","gatekeeper-system"] + + [log_collection_settings.stderr] + enabled = true + exclude_namespaces = ["kube-system","gatekeeper-system"] + + [log_collection_settings.enrich_container_logs] + enabled = true +``` ++ +## Platform log filtering (System Kubernetes namespaces) +By default, container logs from the system namespace are excluded from collection to minimize the Log Analytics cost. Container logs of system containers can be critical though in specific troubleshooting scenarios. This feature is restricted to the following system namespaces: `kube-system`, `gatekeeper-system`, `calico-system`, `azure-arc`, `kube-public`, and `kube-node-lease`. + +Enable platform logs using [ConfigMap](./container-insights-data-collection-configure.md#configure-data-collection-using-configmap) with the `collect_system_pod_logs` setting. You must also ensure that the system namespace is not in the `exclude_namespaces` setting. + +The following example shows the ConfigMap settings to collect stdout and stderr logs of `coredns` container in the `kube-system` namespace. + +```yaml +[log_collection_settings] + [log_collection_settings.stdout] + enabled = true + exclude_namespaces = ["gatekeeper-system"] + collect_system_pod_logs = ["kube-system:coredns"] + + [log_collection_settings.stderr] + enabled = true + exclude_namespaces = ["kube-system","gatekeeper-system"] + collect_system_pod_logs = ["kube-system:coredns"] +``` + +## Annotation based filtering for workloads +Annotation-based filtering enables you to exclude log collection for certain pods and containers by annotating the pod. This can reduce your logs ingestion cost significantly and allow you to focus on relevant information without sifting through noise. + +Enable annotation based filtering using [ConfigMap](./container-insights-data-collection-configure.md#configure-data-collection-using-configmap) with the following settings. + +```yml +[log_collection_settings.filter_using_annotations] + enabled = true +``` + +You must also add the required annotations on your workload pod spec. The following table highlights different possible pod annotations. + +\| Annotation \| Description \| +\| \| - \| +\| `fluentbit.io/exclude: "true"` \| Excludes both stdout & stderr streams on all the containers in the Pod \| +\| `fluentbit.io/exclude_stdout: "true"` \| Excludes only stdout stream on all the containers in the Pod \| +\| `fluentbit.io/exclude_stderr: "true"` \| Excludes only stderr stream on all the containers in the Pod \| +\| `fluentbit.io/exclude_container1: "true"` \| Exclude both stdout & stderr streams only for the container1 in the pod \| +\| `fluentbit.io/exclude_stdout_container1: "true"` \| Exclude only stdout only for the container1 in the pod \| + +>[!NOTE] +>These annotations are fluent bit based. If you use your own fluent-bit based log collection solution with the Kubernetes plugin filter and annotation based exclusion, it will stop collecting logs from both Container Insights and your solution. + +Following is an example of `fluentbit.io/exclude: "true"` annotation in a Pod spec: + +``` +apiVersion: v1 +kind: Pod +metadata: + name: apache-logs + labels: + app: apache-logs + annotations: + fluentbit.io/exclude: "true" +spec: + containers: + - name: apache + image: edsiper/apache_logs +``` + +## Filter environment variables +Enable collection of environment variables across all pods and nodes in the cluster using [ConfigMap](./container-insights-data-collection-configure.md#configure-data-collection-using-configmap) with the following settings. + +```yaml +[log_collection_settings.env_var] + enabled = true +``` + +If collection of environment variables is globally enabled, you can disable it for a specific container by setting the environment variable `AZMON_COLLECT_ENV` to `False` either with a Dockerfile setting or in the [configuration file for the Pod](https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/) under the `env:` section. If collection of environment variables is globally disabled, you can't enable collection for a specific container. The only override that can be applied at the container level is to disable collection when it's already enabled globally. +++ +## Impact on visualizations and alerts + +If you have any custom alerts or workbooks using Container insights data, then modifying your data collection settings might degrade those experiences. If you're excluding namespaces or reducing data collection frequency, review your existing alerts, dashboards, and workbooks using this data. + +To scan for alerts that reference these tables, run the following Azure Resource Graph query: + +```Kusto +resources +\| where type in~ ('microsoft.insights/scheduledqueryrules') and ['kind'] !in~ ('LogToMetric') +\| extend severity = strcat("Sev", properties["severity"]) +\| extend enabled = tobool(properties["enabled"]) +\| where enabled in~ ('true') +\| where tolower(properties["targetResourceTypes"]) matches regex 'microsoft.operationalinsights/workspaces($\|/.)?' or tolower(properties["targetResourceType"]) matches regex 'microsoft.operationalinsights/workspaces($\|/.)?' or tolower(properties["scopes"]) matches regex 'providers/microsoft.operationalinsights/workspaces($\|/.*)?' +\| where properties contains "Perf" or properties contains "InsightsMetrics" or properties contains "ContainerInventory" or properties contains "ContainerNodeInventory" or properties contains "KubeNodeInventory" or properties contains"KubePodInventory" or properties contains "KubePVInventory" or properties contains "KubeServices" or properties contains "KubeEvents" +\| project id,name,type,properties,enabled,severity,subscriptionId +\| order by tolower(name) asc +``` ++ +## Next steps + +- See [Data transformations in Container insights](./container-insights-transformations.md) to add transformations to the DCR that will further filter data based on detailed criteria. +
azure-monitor	Container Insights Log Query	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-log-query.md	The required tables for this chart include KubeNodeInventory, KubePodInventory, ## Prometheus metrics -The following examples requires the configuration described in [Send Prometheus metrics to Log Analytics workspace with Container insights](container-insights-prometheus-logs.md). +The following examples require the configuration described in [Send Prometheus metrics to Log Analytics workspace with Container insights](container-insights-prometheus-logs.md). To view Prometheus metrics scraped by Azure Monitor and filtered by namespace, specify "prometheus". Here's a sample query to view Prometheus metrics from the `default` Kubernetes namespace. The output will show results similar to the following example. ## Configuration or scraping errors- To investigate any configuration or scraping errors, the following example query returns informational events from the `KubeMonAgentEvents` table. ``` The output shows results similar to the following example: :::image type="content" source="./media/container-insights-log-query/log-query-example-kubeagent-events.png" alt-text="Screenshot that shows log query results of informational events from an agent." lightbox="media/container-insights-log-query/log-query-example-kubeagent-events.png"::: ## Frequently asked questions- This section provides answers to common questions. ### Can I view metrics collected in Grafana?- Container insights support viewing metrics stored in your Log Analytics workspace in Grafana dashboards. We've provided a template that you can download from the Grafana [dashboard repository](https://grafana.com/grafana/dashboards?dataSource=grafana-azure-monitor-datasource&category=docker). Use it to get started and as a reference to help you learn how to query data from your monitored clusters to visualize in custom Grafana dashboards. ### Why are log lines larger than 16 KB split into multiple records in Log Analytics? +The agent uses the [Docker JSON file logging driver](https://docs.docker.com/config/containers/logging/json-file/) to capture the stdout and stderr of containers. This logging driver splits log lines [larger than 16 KB](https://github.com/moby/moby/pull/22982) into multiple lines when they're copied from stdout or stderr to a file. Use [Multi-line logging](./container-insights-logs-schema.md#multi-line-logging) to get log record size up to 64KB. -The agent uses the [Docker JSON file logging driver](https://docs.docker.com/config/containers/logging/json-file/) to capture the stdout and stderr of containers. This logging driver splits log lines [larger than 16 KB](https://github.com/moby/moby/pull/22982) into multiple lines when they're copied from stdout or stderr to a file. Use [Multi-line logging](./container-insights-logs-schema.md#multi-line-logging-in-container-insights) to get log record size up to 64KB. ## Next steps -Container insights doesn't include a predefined set of alerts. To learn how to create recommended alerts for high CPU and memory utilization to support your DevOps or operational processes and procedures, see [Create performance alerts with Container insights](./container-insights-log-alerts.md). +Container insights doesn't include a predefined set of alerts. To learn how to create recommended alerts for high CPU and memory utilization to support your DevOps or operational processes and procedures, see [Create performance alerts with Container insights](./container-insights-log-alerts.md).
azure-monitor	Container Insights Logs Schema	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-logs-schema.md	# Container insights log schema -Container insights stores log data it collects in a table called ContainerLogV2. This article describes the schema of this table and its comparison and migration from the legacy ContainerLog table. +Container insights stores log data it collects in a table called ContainerLogV2 in a Log Analytics workspace. This article describes the schema of this table and configuration options for it. It also compares this table to the legacy ContainerLog table and provides detail for migrating from it. +## Table comparison + +ContainerLogV2 is the default schema for CLI version 2.54.0 and greater. This is the default table for customers who onboard Container insights with managed identity authentication. ContainerLogV2 can be explicitly enabled through CLI version 2.51.0 or higher using data collection settings. + >[!IMPORTANT] -> ContainerLogV2 will be the default schema via the ConfigMap for CLI version 2.54.0 and greater. ContainerLogV2 will be default ingestion format for customers who will be onboarding container insights with Managed Identity Auth using ARM, Bicep, Terraform, Policy and Portal onboarding. ContainerLogV2 can be explicitly enabled through CLI version 2.51.0 or higher using Data collection settings. -> > Support for the ContainerLog table will be retired on 30th September 2026. - -## Table comparison The following table highlights the key differences between using ContainerLogV2 and ContainerLog schema. \| Feature differences \| ContainerLog \| ContainerLogV2 \| The following table highlights the key differences between using ContainerLogV2 \| Querying \| Requires multiple join operations with inventory tables for standard queries. \| Includes additional pod and container metadata to reduce query complexity and join operations. \| \| Multiline \| Not supported, multiline entries are split into multiple rows. \| Support for multiline logging to allow consolidated, single entries for multiline output. \| -<sup>1</sup>If LogMessage is a valid JSON and has a key named level, its value will be used. Otherwise we use a regex based keyword matching approach to infer LogLevel from the LogMessage itself. Note that you might see some misclassifications as this value is inferred. +<sup>1</sup> If `LogMessage` is valid JSON and has a key named `level`, its value will be used. Otherwise, regex based keyword matching is used to infer `LogLevel` from `LogMessage`. This inference may result in some misclassifications. `LogLevel` is a string field with a health value such as `CRITICAL`, `ERROR`, `WARNING`, `INFO`, `DEBUG`, `TRACE`, or `UNKNOWN`. -<sup>2</sup>KubernetesMetadata is optional column and collection of this field can be enabled with Kubernetes Metadata feature. Value of this field is JSON and it contains fields such as podLabels, podAnnotations, podUid, Image, ImageTag and Image repo. +<sup>2</sup> `KubernetesMetadata` is an optional column that is enabled with [Kubernetes metadata](). The value of this field is JSON with the fields `podLabels`, `podAnnotations`, `podUid`, `Image`, `ImageTag`, and `Image repo`. -<sup>3</sup>DCR configuration not supported for clusters using service principal authentication based clusters. To use this experience, [migrate your clusters with service principal to managed identity](./container-insights-authentication.md). +<sup>3</sup> DCR configuration requires [managed identity authentication](./container-insights-authentication.md). >[!NOTE] -> [Export](../logs/logs-data-export.md) to Event Hub and Storage Account is not supported if the incoming LogMessage is not a valid JSON. For best performance, we recommend emitting container logs in JSON format. +> [Export](../logs/logs-data-export.md) to Event Hub and Storage Account is not supported if the incoming `LogMessage` is not valid JSON. For best performance, emit container logs in JSON format. -## Assess the impact on existing alerts -Before you enable the ContainerLogsV2 schema, you should assess whether you have any alert rules that rely on the ContainerLog table. Any such alerts need to be updated to use the new table. +## Enable the ContainerLogV2 schema +Enable the ContainerLogV2 schema for a cluster either using the cluster's [Data Collection Rule (DCR)](./container-insights-data-collection-configure.md#configure-data-collection-using-dcr) or [ConfigMap](./container-insights-data-collection-configure.md#configure-data-collection-using-configmap). If both settings are enabled, the ConfigMap takes precedence. The `ContainerLog` table is used only when both the DCR and ConfigMap are explicitly set to off. -To scan for alerts that reference the ContainerLog table, run the following Azure Resource Graph query: +Before you enable the ContainerLogsV2 schema, you should assess whether you have any alert rules that rely on the ContainerLog table. Any such alerts need to be updated to use the new table. Run the following Azure Resource Graph query to scan for alert rules that reference the `ContainerLog` table. ```Kusto resources resources \| order by tolower(name) asc ``` -## Enable the ContainerLogV2 schema -You can enable the ContainerLogV2 schema for a cluster either using the cluster's Data Collection Rule (DCR) or ConfigMap. If both settings are enabled, the ConfigMap takes precedence. Stdout and stderr logs are only ingested to the ContainerLog table when both the DCR and ConfigMap are explicitly set to off. - -## Kubernetes metadata and logs filtering -Kubernetes Metadata and Logs Filtering enhances the ContainerLogsV2 schema with more Kubernetes metadata such as PodLabels, PodAnnotations, PodUid, Image, ImageID, ImageRepo, and ImageTag. Additionally, the Logs Filtering feature provides filtering capabilities for both workload and platform (that is system namespaces) containers. With these features, users gain richer context and improved visibility into their workloads. - -### Key features -- Enhanced ContainerLogV2 schema with Kubernetes metadata fields: Kubernetes Logs Metadata introduces other optional metadata fields that enhance troubleshooting experience with simple Log Analytics queries and removes the need for joining with other tables. These fields include essential information such as "PodLabels," "PodAnnotations," "PodUid," "Image," "ImageID," "ImageRepo," and "ImageTag". By having this context readily available, users can expediate their troubleshooting and identify the issues quickly.--- Customized include list configuration: Users can tailor new metadata fields they want to see through editing the [configmap](https://github.com/microsoft/Docker-Provider/blob/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml). Note that all metadata fields are collected by default when the `metadata_collection` is enabled and if you want to select specific fields, uncomment `include_fields` and specify the fields that need to be collected.- -<!-- convertborder later --> --- Enhanced ContainerLogV2 schema with log level: Users can now assess application health based on color coded severity levels such as CRITICAL, ERROR, WARNING, INFO, DEBUG, TRACE, or UNKNOWN. ItΓÇÖs a crucial tool for incident response and proactive monitoring. By visually distinguishing severity levels, users can quickly pinpoint affected resources. The color-coded system streamlines the investigation process and allows users to drill down even further by selecting the panel for an explore experience for further debugging. However, itΓÇÖs important to note that this functionality is only applicable when using Grafana. If youΓÇÖre using Log Analytics Workspace, the LogLevel is simply another column in the ContainerLogV2 table.--- Annotation based log filtering for workloads: Efficient log filtering technique through Pod Annotations. Users can focus on relevant information without sifting through noise. Annotation-based filtering enables users to exclude log collection for certain pods and containers by annotating the pod, which would help reduce the log analytics cost significantly.--- ConfigMap based log filtering for platform logs (System Kubernetes Namespaces): Platform logs are emitted by containers in the system (or similar restricted) namespaces. By default, all the container logs from the system namespace are excluded to minimize the Log Analytics cost. However, in specific troubleshooting scenarios, container logs of system container play a crucial role. For instance, consider the coredns container within the kube-system namespace. To collect logs (stdout and stderr) exclusively from the coredns container form kube-system, you can enable the following settings in the [configmap](https://github.com/microsoft/Docker-Provider/blob/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml). +## Kubernetes metadata and logs filtering +Kubernetes metadata and logs filtering extends the ContainerLogsV2 schema with additional Kubernetes metadata. The logs filtering feature provides filtering capabilities for both workload and platform containers. These features give you richer context and improved visibility into your workloads. -<!-- convertborder later --> --- Grafana dashboard for visualization: The Grafana dashboard not only displaysΓÇ»color-coded visualizationsΓÇ»of log levelsΓÇ»ranging from CRITICAL to UNKNOWN, but also dives into Log Volume, Log Rate, Log Records, Logs. Users can get Time-Sensitive Analysis, dynamic insights into log level trends over time, and crucial real-time monitoring. We also provide a Detailed breakdown by Computer, Pod, and Container, which empowers in-depth analysis and pinpointed troubleshooting.ΓÇï And finally in the new Logs table experience, users can view in depth details with expand view, and view the data in each column and zoom into the information they want to see.- -Here's a video showcasing the Grafana Dashboard: - -> [!VIDEO https://learn-video.azurefd.net/vod/player?id=15c1c297-9e96-47bf-a31e-76056d026bd1] +### Features -### How to enable Kubernetes metadata and logs filtering +- Enhanced ContainerLogV2 schema + When Kubernetes Logs Metadata is enabled, it adds a column to `ContainerLogV2` called `KubernetesMetadata` that enhances troubleshooting with simple log queries and removes the need for joining with other tables. The fields in this column include: `PodLabels`, `PodAnnotations`, `PodUid`, `Image`, `ImageID`, `ImageRepo`, `ImageTag`. These fields enhance the troubleshooting experience using log queries without having to join with other tables. See below for details on enabling the Kubernetes metadata feature. +- Log level + This feature adds a `LogLevel` column to ContainerLogV2 with the possible values critical, error, warning, info, debug, trace, or unknown. This helps you assess application health based on severity level. Adding the Grafana dashboard, you can visualize the log level trends over time and quickly pinpoint affected resources. +- Grafana dashboard for visualization + The Grafana dashboard provides a color-coded visualization of the log level and also provides insights into Log Volume, Log Rate, Log Records, Logs. You can get time-sensitive analysis, dynamic insights into log level trends over time, and crucial real-time monitoring. The dashboard also provides a detailed breakdown by computer, pod, and container, which empowers in-depth analysis and pinpointed troubleshooting. See below for details on installing the Grafana dashboard. +- Annotation based log filtering for workloads + Efficient log filtering through pod annotations. This allows you to focus on relevant information without sifting through noise. Annotation-based filtering enables you to exclude log collection for certain pods and containers by annotating the pod, which would help reduce the log analytics cost significantly. See [Annotation-based log filtering](./container-insights-data-collection-filter.md#annotation-based-filtering-for-workloads) for details on configuring annotation based filtering. +- ConfigMap based log filtering for platform logs (System Kubernetes Namespaces) + Platform logs are emitted by containers in the system (or similar restricted) namespaces. By default, all the container logs from the system namespace are excluded to minimize the cost of data in your Log Analytics workspace. In specific troubleshooting scenarios though, container logs of system container play a crucial role. One example is the `coredns` container in the `kube-system` namespace. -#### Prerequisites + > [!VIDEO https://learn-video.azurefd.net/vod/player?id=15c1c297-9e96-47bf-a31e-76056d026bd1] -1. Migrate to Managed Identity Authentication. [Learn More](./container-insights-authentication.md#migrate-to-managed-identity-authentication). -2. Ensure that the ContainerLogV2 is enabled. Managed Identity Auth clusters have this schema enabled by default. If not, [enable the ContainerLogV2 schema](./container-insights-logs-schema.md#enable-the-containerlogv2-schema). +### Enable Kubernetes metadata -#### Limitations +> [!IMPORTANT] +> Collection of Kubernetes metadata requires [managed identity authentication](./container-insights-authentication.md#migrate-to-managed-identity-authentication) and [ContainerLogsV2](./container-insights-logs-schema.md) -The [ContainerLogV2 Grafana Dashboard](https://grafana.com/grafana/dashboards/20995-azure-monitor-container-insights-containerlogv2/) is not supported with the Basic Logs SKU on the ContainerLogV2 table. -#### Enable Kubernetes metadata +Enable Kubernetes metadata using [ConfigMap](./container-insights-data-collection-configure.md#configure-data-collection-using-configmap) with the following settings. All metadata fields are collected by default when the `metadata_collection` is enabled. Uncomment `include_fields` to specify individual fields to be collected. -1. Download the [configmap](https://github.com/microsoft/Docker-Provider/blob/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml) and modify the settings from false to true as seen in the below screenshot. Note that all the supported metadata fields are collected by default. If you wish to collect specific fields, specify the required fields in `include_fields`. - -<!-- convertborder later --> +```yaml +[log_collection_settings.metadata_collection] + enabled = true + include_fields = ["podLabels","podAnnotations","podUid","image","imageID","imageRepo","imageTag"] +``` -2. Apply the ConfigMap. See [configure configmap](./container-insights-data-collection-configmap.md#configure-and-deploy-configmap) to learn more about deploying and configuring the ConfigMap. +After a few minutes, the `KubernetesMetadata` column should be included with any log queries for `ContainerLogV2` table as shown below. -3. After a few minutes, data should be flowing into your ContainerLogV2 table with Kubernetes Logs Metadata, as shown in the below screenshot. -<!-- convertborder later --> :::image type="content" source="./media/container-insights-logging-v2/container-log-v2.png" lightbox="./media/container-insights-logging-v2/container-log-v2.png" alt-text="Screenshot that shows containerlogv2." border="false"::: -#### Onboard to the Grafana dashboard experience - -1. Under the Insights tab, select monitor settings and onboard to Grafana Dashboard with version 10.3.4+ - -<!-- convertborder later --> - -2. Ensure that you have one of the Grafana Admin/Editor/Reader roles by checking Access control (IAM). If not, add them. - -<!-- convertborder later --> - -3. Ensure your Grafana instance has access to the Azure Logs Analytics(LA) workspace. If it doesnΓÇÖt have access, you need to grant Grafana Instance Monitoring Reader role access to your LA workspace. - -<!-- convertborder later --> +### Install Grafana dashboard -4. Navigate to your Grafana workspace and import the [ContainerLogV2 Dashboard](https://grafana.com/grafana/dashboards/20995-azure-monitor-container-insights-containerlogv2/) from Grafana gallery. +> [!IMPORTANT] +> If you enabled Grafana using the guidance at [Enable monitoring for Kubernetes clusters](./kubernetes-monitoring-enable.md#enable-prometheus-and-grafana) then your Grafana instance should already have access to your Azure Monitor workspace for Prometheus metrics. The Kubernetes Logs Metadata dashboard also requires access to your Log Analytics workspace which contains log data. See [How to modify access permissions to Azure Monitor](../../managed-grafan) for guidance on granting your Grafana instance the Monitoring Reader role for your Log Analytics workspace. -5. Select your information for DataSource, Subscription, ResourceGroup, Cluster, Namespace, and Labels. The dashboard then populates as depicted in the below screenshot. +Import the dashboard from the Grafana gallery at [ContainerLogV2 Dashboard](https://grafana.com/grafana/dashboards/20995-azure-monitor-container-insights-containerlogv2/). You can then open the dashboard and select values for DataSource, Subscription, ResourceGroup, Cluster, Namespace, and Labels. -<!-- convertborder later --> :::image type="content" source="./media/container-insights-logging-v2/grafana-3.png" lightbox="./media/container-insights-logging-v2/grafana-3.png" alt-text="Screenshot that shows grafana dashboard." border="false"::: >[!NOTE] -> When you initially load the Grafana Dashboard, it could throw some errors due to variables not yet being selected. To prevent this from recurring, save the dashboard after selecting a set of variables so that it becomes default on the first open. - -#### Enable annotation based filtering - -Follow the below mentioned steps to enable annotation based filtering. Find the link here once the related filtering documentation is published. - -1. Download the [configmap](https://github.com/microsoft/Docker-Provider/blob/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml) and modify the settings from false to true as seen in the below screenshot. - -<!-- convertborder later --> - -2. Apply the ConfigMap. See [configure configmap](./container-insights-data-collection-configmap.md#configure-and-deploy-configmap) to learn more about deploying and configuring the ConfigMap. - -3. Add the required annotations on your workload pod spec. Following table highlights different possible Pod annotations and descriptions of what they do. - -\| Annotation \| Description \| -\| \| - \| -\| `fluentbit.io/exclude: "true"` \| Excludes both stdout & stderr streams on all the containers in the Pod \| -\| `fluentbit.io/exclude_stdout: "true"` \| Excludes only stdout stream on all the containers in the Pod \| -\| `fluentbit.io/exclude_stderr: "true"` \| Excludes only stderr stream on all the containers in the Pod \| -\| `fluentbit.io/exclude_container1: "true"` \| Exclude both stdout & stderr streams only for the container1 in the pod \| -\| `fluentbit.io/exclude_stdout_container1: "true"` \| Exclude only stdout only for the container1 in the pod \| - ->[!NOTE] ->These annotations are fluent bit based. If you use your own fluent-bit based log collection solution with the Kubernetes plugin filter and annotation based exclusion, it will stop collecting logs from both Container Insights and your solution. +> When you initially load the Grafana Dashboard, you may see errors due to variables not yet being selected. To prevent this from recurring, save the dashboard after selecting a set of variables so that it becomes default on the first open. -Here is an example of `fluentbit.io/exclude: "true"` annotation in Pod spec: - -``` -apiVersion: v1 -kind: Pod -metadata: - name: apache-logs - labels: - app: apache-logs - annotations: - fluentbit.io/exclude: "true" -spec: - containers: - image: edsiper/apache_logs - -``` -#### ConfigMap based log filtering for platform logs (System Kubernetes Namespaces) - -1. Download the [configmap](https://github.com/microsoft/Docker-Provider/blob/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml) and modify the settings related to `collect_system_pod_logs` and `exclude_namespaces`. - -For example, in order to collect stdout & stderr logs of coredns container in the kube-system namespace, make sure that kube-system namespace is not in `exclude_namespaces` and this feature is restricted only to the following system namespaces: kube-system, gatekeeper-system, calico-system, azure-arc, kube-public and kube-node-lease namespaces. - -<!-- convertborder later --> --- -2. Apply the ConfigMap. See [configure configmap](./container-insights-data-collection-configmap.md#configure-and-deploy-configmap) to learn more about deploying and configuring the ConfigMap. -- -## Multi-line logging in Container Insights +## Multi-line logging With multiline logging enabled, previously split container logs are stitched together and sent as single entries to the ContainerLogV2 table. If the stitched log line is larger than 64 KB, it will be truncated due to Log Analytics workspace limits. This feature also has support for .NET, Go, Python and Java stack traces, which appear as single entries in the ContainerLogV2 table. Enable multiline logging with ConfigMap as described in [Configure data collection in Container insights using ConfigMap](container-insights-data-collection-configmap.md). >[!NOTE] The following screenshots show multi-line logging for Go exception stack trace: :::image type="content" source="./media/container-insights-logging-v2/multi-line-enabled-python.png" lightbox="./media/container-insights-logging-v2/multi-line-enabled-python.png" alt-text="Screenshot that shows Multi-line enabled for Python."::: + ## Next steps * Configure [Basic Logs](../logs/logs-table-plans.md) for ContainerLogv2. * Learn how [query data](./container-insights-log-query.md#container-logs) from ContainerLogV2
azure-monitor	Container Insights Private Link	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-private-link.md	# Enable private link with Container insights This article describes how to configure Container insights to use Azure Private Link for your AKS cluster. +## Prerequisites +- Create an Azure Monitor Private Link Scope (AMPLS) following the guidance in [Configure your private link](../logs/private-link-configure.md). +- Configure network isolation on your Log Analytics workspace to disable ingestion for the public networks. Isolate log queries if you want them to be restricted to Private network. ## Cluster using managed identity authentication ### [CLI](#tab/cli) - ### Prerequisites +- Azure CLI version 2.63.0 or higher. +- AKS-preview CLI extension version MUST be 7.0.0b4 or higher if there is an AKS-preview CLI extension installed. ### Existing AKS Cluster Edit the values in the parameter file and deploy the template using any valid me Based on your requirements, you can configure other parameters such `streams`, `enableContainerLogV2`, `enableSyslog`, `syslogLevels`, `syslogFacilities`, `dataCollectionInterval`, `namespaceFilteringModeForDataCollection` and `namespacesForDataCollection`. ### Prerequisites-- Create an Azure Monitor Private Link Scope (AMPLS) following the guidance in [Configure your private link](../logs/private-link-configure.md). - The template must be deployed in the same resource group as the cluster. ### AKS cluster
azure-monitor	Container Insights Syslog	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-syslog.md	Title: Syslog collection with Container Insights description: This article describes how to collect Syslog from AKS nodes using Container insights. Previously updated : 2/28/2024 Last updated : 05/31/2024 Container Insights offers the ability to collect Syslog events from Linux nodes ## Prerequisites -- You need to have managed identity authentication enabled on your cluster. To enable, see [migrate your AKS cluster to managed identity authentication](container-insights-enable-existing-clusters.md?tabs=azure-cli#migrate-to-managed-identity-authentication). Note: Enabling Managed Identity will create a new Data Collection Rule (DCR) named `MSCI-<WorkspaceRegion>-<ClusterName>` +- Syslog collection needs to be enabled for your cluster using the guidance in [Configure and filter log collection in Container insights](./container-insights-data-collection-configure.md). - Port 28330 should be available on the host node.-- Minimum versions of Azure components - - Azure CLI: Minimum version required for Azure CLI is [2.45.0 (link to release notes)](/cli/azure/release-notes-azure-cli#february-07-2023). See [How to update the Azure CLI](/cli/azure/update-azure-cli) for upgrade instructions. - - Azure CLI AKS-Preview Extension: Minimum version required for AKS-Preview Azure CLI extension is [0.5.125 (link to release notes)](https://github.com/Azure/azure-cli-extensions/blob/main/src/aks-preview/HISTORY.rst#05125). See [How to update extensions](/cli/azure/azure-cli-extensions-overview#how-to-update-extensions) for upgrade guidance. - - Linux image version: Minimum version for AKS node linux image is 2022.11.01. See [Upgrade Azure Kubernetes Service (AKS) node images](/azure/aks/node-image-upgrade) for upgrade help. -## How to enable Syslog -### From the Azure portal +## Access Syslog data using built-in workbooks -Navigate to your cluster. Open the _Insights_ tab for your cluster. Open the _Monitor Settings_ panel. Click on Edit collection settings, then check the box for _Enable Syslog collection_ +To get a quick snapshot of your syslog data, use the built-in Syslog workbook using one of the following methods: +- Reports tab in Container Insights. +Navigate to your cluster in the Azure portal and open the Insights. Open the Reports tab and locate the Syslog workbook. -### Using Azure CLI commands + :::image type="content" source="media/container-insights-syslog/syslog-workbook-cluster.gif" lightbox="media/container-insights-syslog/syslog-workbook-cluster.gif" alt-text="Video of Syslog workbook being accessed from Container Insights Reports tab." border="true"::: -Use the following command in Azure CLI to enable syslog collection when you create a new AKS cluster. +- Workbooks tab in AKS +Navigate to your cluster in the Azure portal. Open the Workbooks tab and locate the Syslog workbook. -```azurecli -az aks create -g syslog-rg -n new-cluster --enable-managed-identity --node-count 1 --enable-addons monitoring --enable-msi-auth-for-monitoring --enable-syslog --generate-ssh-key -``` - -Use the following command in Azure CLI to enable syslog collection on an existing AKS cluster. - -```azurecli -az aks enable-addons -a monitoring --enable-msi-auth-for-monitoring --enable-syslog -g syslog-rg -n existing-cluster -``` - -### Using ARM templates - -You can also use ARM templates for enabling syslog collection - -1. Download the template in the [GitHub content file](https://aka.ms/aks-enable-monitoring-msi-onboarding-template-file) and save it as existingClusterOnboarding.json. - -1. Download the parameter file in the [GitHub content file](https://aka.ms/aks-enable-monitoring-msi-onboarding-template-parameter-file) and save it as existingClusterParam.json. - -1. Edit the values in the parameter file: - - - `aksResourceId`: Use the values on the AKS Overview page for the AKS cluster. - - `aksResourceLocation`: Use the values on the AKS Overview page for the AKS cluster. - - `workspaceResourceId`: Use the resource ID of your Log Analytics workspace. - - `resourceTagValues`: Match the existing tag values specified for the existing Container insights extension data collection rule (DCR) of the cluster and the name of the DCR. The name will be MSCI-\<clusterName\>-\<clusterRegion\> and this resource created in an AKS clusters resource group. If this is the first time onboarding, you can set the arbitrary tag values. - - `enableSyslog`: Set to true - - `syslogLevels`: Array of syslog levels to collect. Default collects all levels. - - `syslogFacilities`: Array of syslog facilities to collect. Default collects all facilities - -> [!NOTE] -> Syslog level and facilities customization is currently only available via ARM templates. - -### Deploy the template - -Deploy the template with the parameter file by using any valid method for deploying Resource Manager templates. For examples of different methods, see [Deploy the sample templates](../resource-manager-samples.md#deploy-the-sample-templates). - -#### Deploy with Azure PowerShell - -```powershell -New-AzResourceGroupDeployment -Name OnboardCluster -ResourceGroupName <ResourceGroupName> -TemplateFile .\existingClusterOnboarding.json -TemplateParameterFile .\existingClusterParam.json -``` + :::image type="content" source="media/container-insights-syslog/syslog-workbook-container-insights-reports-tab.gif" lightbox="media/container-insights-syslog/syslog-workbook-container-insights-reports-tab.gif" alt-text="Video of Syslog workbook being accessed from cluster workbooks tab." border="true"::: -The configuration change can take a few minutes to complete. When it's finished, a message similar to the following example includes this result: - -```output -provisioningState : Succeeded -``` - -#### Deploy with Azure CLI - -```azurecli -az login -az account set --subscription "Subscription Name" -az deployment group create --resource-group <ResourceGroupName> --template-file ./existingClusterOnboarding.json --parameters @./existingClusterParam.json -``` - -The configuration change can take a few minutes to complete. When it's finished, a message similar to the following example includes this result: - -```output -provisioningState : Succeeded -``` - -## How to access Syslog data - -### Access using built-in workbooks - -To get a quick snapshot of your syslog data, customers can use our built-in Syslog workbook. There are two ways to access the built-in workbook. - -Option 1 - The Reports tab in Container Insights. -Navigate to your cluster. Open the _Insights_ tab for your cluster. Open the _Reports_ tab and look for the _Syslog_ workbook. -- -Option 2 - The Workbooks tab in AKS -Navigate to your cluster. Open the _Workbooks_ tab for your cluster and look for the _Syslog_ workbook. -- -### Access using a Grafana dashboard +### Access Syslog data using a Grafana dashboard Customers can use our Syslog dashboard for Grafana to get an overview of their Syslog data. Customers who create a new Azure-managed Grafana instance will have this dashboard available by default. Customers with existing instances or those running their own instance can [import the Syslog dashboard from the Grafana marketplace](https://grafana.com/grafana/dashboards/19866-azure-monitor-container-insights-syslog/). Customers can use our Syslog dashboard for Grafana to get an overview of their S :::image type="content" source="media/container-insights-syslog/grafana-screenshot.png" lightbox="media/container-insights-syslog/grafana-screenshot.png" alt-text="Screenshot of Syslog Grafana dashboard." border="false"::: -### Access using log queries +### Access Syslog data using log queries Syslog data is stored in the [Syslog](/azure/azure-monitor/reference/tables/syslog) table in your Log Analytics workspace. You can create your own [log queries](../logs/log-query-overview.md) in [Log Analytics](../logs/log-analytics-overview.md) to analyze this data or use any of the [prebuilt queries](../logs/log-query-overview.md). Select the minimum log level for each facility that you want to collect. Once setup customers can start sending Syslog data to the tools of their choice - [Send Syslog to Microsoft Sentinel](/azure/sentinel/connect-syslog) - [Export data from Log Analytics](/azure/azure-monitor/logs/logs-data-export?tabs=portal)- -Read more - [Syslog record properties](/azure/azure-monitor/reference/tables/syslog) Share your feedback for this feature here: https://forms.office.com/r/BBvCjjDLTS
azure-monitor	Container Insights Transformations	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/container-insights-transformations.md	Title: Data transformations in Container insights description: Describes how to transform data using a DCR transformation in Container insights. Previously updated : 11/08/2023 Last updated : 07/17/2024 This article describes how to implement data transformations in Container insights. [Transformations](../essentials/data-collection-transformations.md) in Azure Monitor allow you to modify or filter data before it's ingested in your Log Analytics workspace. They allow you to perform such actions as filtering out data collected from your cluster to save costs or processing incoming data to assist in your data queries. -## Data Collection Rules (DCRs) -Transformations are implemented in [data collection rules (DCRs)](../essentials/data-collection-rule-overview.md) which are used to configure data collection in Azure Monitor. When you onboard Container insights for a cluster, a DCR is created for it with the name MSCI-\<cluster-region\>-<\cluster-name\>. You can view this DCR from Data Collection Rules in the Monitor menu in the Azure portal. To create a transformation, you must either modify this DCR, or onboard your cluster with a custom DCR that includes your transformation. +> [!IMPORTANT] +> The articles [Configure log collection in Container insights](./container-insights-data-collection-configure.md) and [Filtering log collection in Container insights](./container-insights-data-collection-filter.md) describe standard configuration settings to configure and filter data collection for Container insights. You should perform any required configuration using these features before using transformations. Use a transformation to perform filtering or other data configuration that you can't perform with the standard configuration settings. -The following table describes the different methods to edit the DCR, while the rest of this article provides details of the edits that you need to perform to transform Container insights data. +## Data collection rule +Transformations are implemented in [data collection rules (DCRs)](../essentials/data-collection-rule-overview.md) which are used to configure data collection in Azure Monitor. [Configure data collection using DCR](./container-insights-data-collection-configure.md) describes the DCR that's automatically created when you enable Container insights on a cluster. To create a transformation, you must perform one of the following actions: -\| Method \| Description \| -\|:\|:\| -\| New cluster \| Use an existing [ARM template ](https://github.com/microsoft/Docker-Provider/tree/ci_prod/scripts/onboarding/aks/onboarding-using-msi-auth) to onboard an AKS cluster to Container insights. Modify the `dataFlows` section of the DCR in that template to include a transformation, similar to one of the samples below. \| -\| Existing DCR \| After a cluster has been onboarded to Container insights, edit its DCR to include a transformation using the process in [Editing Data Collection Rules](../essentials/data-collection-rule-edit.md). \| +- New cluster. Use an existing [ARM template ](https://github.com/microsoft/Docker-Provider/tree/ci_prod/scripts/onboarding/aks/onboarding-using-msi-auth) to onboard an AKS cluster to Container insights. Modify the DCR in that template with your required configuration, including a transformation similar to one of the samples below. +- Existing DCR. After a cluster has been onboarded to Container insights and data collection configured, edit its DCR to include a transformation using any of the methods in [Editing Data Collection Rules](../essentials/data-collection-rule-edit.md). + +> [!NOTE] +> There is currently minimal UI for editing DCRs, which is required to add transformations. In most cases, you need to manually edit the the DCR. This article describes the DCR structure to implement. See [Create and edit data collection rules (DCRs) in Azure Monitor](../essentials/data-collection-rule-edit.md) for guidance on how to implement that structure. ## Data sources -The [dataSources section of the DCR](../essentials/data-collection-rule-structure.md#datasources) defines the different types of incoming data that the DCR will process. For Container insights, this includes the `ContainerInsights` extension, which includes one or more predefined `streams` starting with the prefix Microsoft-. +The [dataSources section of the DCR](../essentials/data-collection-rule-structure.md#datasources) defines the different types of incoming data that the DCR will process. For Container insights, this is the Container insights extension, which includes one or more predefined `streams` starting with the prefix Microsoft-. The list of Container insights streams in the DCR depends on the [Cost preset](container-insights-cost-config.md#cost-presets) that you selected for the cluster. If you collect all tables, the DCR will use the `Microsoft-ContainerInsights-Group-Default` stream, which is a group stream that includes all of the streams listed in [Stream values](container-insights-cost-config.md#stream-values). You must change this to individual streams if you're going to use a transformation. Any other cost preset settings will already use individual streams. -The snippet below shows the `Microsoft-ContainerInsights-Group-Default` stream. See the [Sample DCRs](#sample-dcrs) for a sample of individual streams. +The sample below shows the `Microsoft-ContainerInsights-Group-Default` stream. See the [Sample DCRs](#sample-dcrs) for samples using individual streams. ```json "dataSources": { "extensions": [ { - "name": "ContainerInsightsExtension", - "extensionName": "ContainerInsights", - "extensionSettings": { }, "streams": [ "Microsoft-ContainerInsights-Group-Default" - ] + ], + "name": "ContainerInsightsExtension", + "extensionName": "ContainerInsights", + "extensionSettings": { + "dataCollectionSettings": { + "interval": "1m", + "namespaceFilteringMode": "Off", + "namespaces": null, + "enableContainerLogV2": true + } + } } ] } ``` ++ ## Data flows -The [dataFlows section of the DCR](../essentials/data-collection-rule-structure.md#dataflows) matches streams with destinations. The streams that don't require a transformation can be grouped together in a single entry that includes only the workspace destination. Create a separate entry for streams that require a transformation that includes the workspace destination and the `transformKql` property. +The [dataFlows section of the DCR](../essentials/data-collection-rule-structure.md#dataflows) matches streams with destinations that are defined in the `destinations` section of the DCR. Table names don't have to be specified for known streams if the data is being sent to the default table. The streams that don't require a transformation can be grouped together in a single entry that includes only the workspace destination. Each will be sent to its default table. -The snippet below shows the `dataFlows` section for a single stream with a transformation. See the [Sample DCRs](#sample-dcrs) for multiple data flows in a single DCR. +Create a separate entry for streams that require a transformation. This should include the workspace destination and the `transformKql` property. If you're sending data to an alternate table, then you need to include the `outputStream` property which specifies the name of the destination table. + +The sample below shows the `dataFlows` section for a single stream with a transformation. See the [Sample DCRs](#sample-dcrs) for multiple data flows in a single DCR. ```json "dataFlows": [ The snippet below shows the `dataFlows` section for a single stream with a trans ] ``` - ## Sample DCRs -The following samples show DCRs for Container insights using transformations. Use these samples as a starting point and customize then as required to meet your particular requirements. -### Filter for a particular namespace -This sample uses the log query `source \| where PodNamespace == 'kube-system'` to collect data for a single namespace in `ContainerLogsV2`. You can replace `kube-system` in this query with another namespace or replace the `where` clause with another filter to match the particular data you want to collect. The other streams are grouped into a separate data flow and have no transformation applied. +### Filter data + +The first example filters out data from the `ContainerLogV2` based on the `LogLevel` column. Only records with a `LogLevel` of `error` or `critical` will be collected since these are the entries that you might use for alerting and identifying issues in the cluster. Collecting and storing other levels such as `info` and `debug` generate cost without significant value. +You can retrieve these records using the following log query. + +```kusto +ContainerLogV2 \| where LogLevel in ('error', 'critical') +``` + +This logic is shown in the following diagram. +++ +In a transformation, the table name `source` is used to represent the incoming data. Following is the modified query to use in the transformation. + +```kusto +source \| where LogLevel in ('error', 'critical') +``` + +The following sample shows this transformation added to the Container insights DCR. Note that a separate data flow is used for `Microsoft-ContainerLogV2` since this is the only incoming stream that the transformation should be applied to. A separate data flow is used for the other streams. ```json { "properties": { + "location": "eastus2", + "kind": "Linux", "dataSources": { "syslog": [], "extensions": [ { - "name": "ContainerInsightsExtension", - "extensionName": "ContainerInsights", - "extensionSettings": { }, "streams": [ - "Microsoft-ContainerLog", "Microsoft-ContainerLogV2", "Microsoft-KubeEvents", - "Microsoft-KubePodInventory", - "Microsoft-KubeNodeInventory", - "Microsoft-KubePVInventory", - "Microsoft-KubeServices", - "Microsoft-KubeMonAgentEvents", - "Microsoft-InsightsMetrics", - "Microsoft-ContainerInventory", - "Microsoft-ContainerNodeInventory", - "Microsoft-Perf" - ] + "Microsoft-KubePodInventory" + ], + "extensionName": "ContainerInsights", + "extensionSettings": { + "dataCollectionSettings": { + "interval": "1m", + "namespaceFilteringMode": "Off", + "enableContainerLogV2": true + } + }, + "name": "ContainerInsightsExtension" } ] }, This sample uses the log query `source \| where PodNamespace == 'kube-system'` to "logAnalytics": [ { "workspaceResourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourcegroups/my-resource-group/providers/microsoft.operationalinsights/workspaces/my-workspace", + "workspaceId": "00000000-0000-0000-0000-000000000000", "name": "ciworkspace" } ] This sample uses the log query `source \| where PodNamespace == 'kube-system'` to "dataFlows": [ { "streams": [ - "Microsoft-ContainerLog", "Microsoft-KubeEvents", - "Microsoft-KubePodInventory", - "Microsoft-KubeNodeInventory", - "Microsoft-KubePVInventory", - "Microsoft-KubeServices", - "Microsoft-KubeMonAgentEvents", - "Microsoft-InsightsMetrics", - "Microsoft-ContainerNodeInventory", - "Microsoft-Perf" + "Microsoft-KubePodInventory" ], "destinations": [ "ciworkspace" - ] + ], }, { "streams": [ This sample uses the log query `source \| where PodNamespace == 'kube-system'` to "destinations": [ "ciworkspace" ], - "transformKql": "source \| where PodNamespace == 'kube-system'" + "transformKql": "source \| where LogLevel in ('error', 'critical')" } - ] - } + ], + }, } ``` -## Add a column to a table -This sample uses the log query `source \| extend new_CF = ContainerName` to send data to a custom column added to the `ContainerLogV2` table. This transformation requires that you add the custom column to the table using the process described in [Add or delete a custom column](../logs/create-custom-table.md#add-or-delete-a-custom-column). The other streams are grouped into a separate data flow and have no transformation applied. +### Send data to different tables + +In the example above, only records with a `LogLevel` of `error` or `critical` are collected. An alternate strategy instead of not collecting these records at all is to save them to an alternate table configured for basic logs. + +For this strategy, two transformations are needed. The first transformation sends the records with `LogLevel` of `error` or `critical` to the default table. The second transformation sends the other records to a custom table named `ContainerLogV2_CL`. The queries for each are shown below using `source` for the incoming data as described in the previous example. + +```kusto +# Return error and critical logs +source \| where LogLevel in ('error', 'critical') + +# Return logs that aren't error or critical +source \| where LogLevel !in ('error', 'critical') +``` + +This logic is shown in the following diagram. + +> [!IMPORTANT] +> Before you install the DCR in this sample, you must [create a new table](../logs/create-custom-table.md) with the same schema as `ContainerLogV2`. Name it `ContainerLogV2_CL` and [configure it for basic logs](../logs/basic-logs-configure.md). +The following sample shows this transformation added to the Container insights DCR. There are two data flows for `Microsoft-ContainerLogV2` in this DCR, one for each transformation. The first sends to the default table you don't need to specify a table name. The second requires the `outputStream` property to specify the destination table. ```json { "properties": { + "location": "eastus2", + "kind": "Linux", "dataSources": { "syslog": [], "extensions": [ { - "extensionName": "ContainerInsights", - "extensionSettings": { }, - "name": "ContainerInsightsExtension", "streams": [ - "Microsoft-ContainerLog", "Microsoft-ContainerLogV2", "Microsoft-KubeEvents", - "Microsoft-KubePodInventory", - "Microsoft-KubeNodeInventory", - "Microsoft-KubePVInventory", - "Microsoft-KubeServices", - "Microsoft-KubeMonAgentEvents", - "Microsoft-InsightsMetrics", - "Microsoft-ContainerInventory", - "Microsoft-ContainerNodeInventory", - "Microsoft-Perf" - ] + "Microsoft-KubePodInventory" + ], + "extensionName": "ContainerInsights", + "extensionSettings": { + "dataCollectionSettings": { + "interval": "1m", + "namespaceFilteringMode": "Off", + "enableContainerLogV2": true + } + }, + "name": "ContainerInsightsExtension" } ] }, This sample uses the log query `source \| extend new_CF = ContainerName` to send "logAnalytics": [ { "workspaceResourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourcegroups/my-resource-group/providers/microsoft.operationalinsights/workspaces/my-workspace", - "name": "ciworkspace" + "workspaceId": "00000000-0000-0000-0000-000000000000", + "name": "ciworkspace" } ] }, "dataFlows": [ { "streams": [ - "Microsoft-ContainerLog", "Microsoft-KubeEvents", - "Microsoft-KubePodInventory", - "Microsoft-KubeNodeInventory", - "Microsoft-KubePVInventory", - "Microsoft-KubeServices", - "Microsoft-KubeMonAgentEvents", - "Microsoft-InsightsMetrics", - "Microsoft-ContainerNodeInventory", - "Microsoft-Perf" + "Microsoft-KubePodInventory" ], "destinations": [ - "ciworkspace" - ] + "ciworkspace" + ], + }, + { + "streams": [ + "Microsoft-ContainerLogV2" + ], + "destinations": [ + "ciworkspace" + ], + "transformKql": "source \| where LogLevel in ('error', 'critical')" }, { "streams": [ This sample uses the log query `source \| extend new_CF = ContainerName` to send "destinations": [ "ciworkspace" ], - "transformKql": "source\n \| extend new_CF = ContainerName" + "transformKql": "source \| where LogLevel !in ('error','critical')", + "outputStream": "Custom-ContainerLogV2_CL" } - ] - } + ], + }, } ```
azure-monitor	Kubernetes Monitoring Enable	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/kubernetes-monitoring-enable.md	Both ARM and Bicep templates are provided in this section. ### [Azure Policy](#tab/policy) -#### Azure Portal +#### Azure portal 1. From the Definitions tab of the Policy menu in the Azure portal, create a policy definition with the following details. When you create a new Azure Monitor workspace, the following additional resource \| `<azuremonitor-workspace-name>` \| Data Collection Endpoint \| MA_\<azuremonitor-workspace-name>_\<azuremonitor-workspace-region>_managed \| Same as Azure Monitor Workspace \| DCE created when you use OSS Prometheus server to Remote Write to Azure Monitor Workspace.\| - ## Differences between Windows and Linux clusters The main differences in monitoring a Windows Server cluster compared to a Linux cluster include:
azure-monitor	Prometheus Metrics Scrape Crd	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/containers/prometheus-metrics-scrape-crd.md	# Customize collection using CRDs (Service and Pod Monitors) -> [!NOTE] - > CRD support with Managed Prometheus is currently in preview. The enablement of Managed Prometheus automatically deploys the custom resource definitions (CRD) for [pod monitors](https://github.com/Azure/prometheus-collector/blob/main/otelcollector/deploy/addon-chart/azure-monitor-metrics-addon/templates/ama-metrics-podmonitor-crd.yaml) and [service monitors](https://github.com/Azure/prometheus-collector/blob/main/otelcollector/deploy/addon-chart/azure-monitor-metrics-addon/templates/ama-metrics-servicemonitor-crd.yaml). These custom resource definitions are the same custom resource definitions (CRD) as [OSS Pod monitors](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.PodMonitor) and [OSS service monitors](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.ServiceMonitor) for Prometheus, except for a change in the group name. If you have existing Prometheus CRDs and custom resources on your cluster, these CRDs won't conflict with the CRDs created by the add-on. At the same time, the managed Prometheus addon does not pick up the CRDs created for the OSS Prometheus. This separation is intentional for the purposes of isolation of scrape jobs.
azure-monitor	Code Optimizations	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/insights/code-optimizations.md	Last updated 06/25/2024 + # Monitor and analyze runtime behavior with Code Optimizations (Preview)
azure-monitor	Aiops Machine Learning	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/logs/aiops-machine-learning.md	Last updated 02/14/2024+ # Customer intent: As a DevOps manager or data scientist, I want to understand which AIOps features Azure Monitor offers and how to implement a machine learning pipeline on data in Azure Monitor Logs so that I can use artifical intelligence to improve service quality and reliability of my IT environment.
azure-monitor	Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/overview.md	Additional integrations not shown in the diagram that may be of interest. \|-\|-\| \| [Defender for the Cloud](/azure/defender-for-cloud/defender-for-cloud-introduction) \| Collect and analyze security events and perform threat analysis. See [Data collection in Defender for the Cloud](/azure/defender-for-cloud/monitoring-components). \| \| [Microsoft Sentinel](../sentinel/index.yml) \| Connect to different sources including Office 365 and Amazon Web Services Cloud Trail. See [Connect data sources](../sentinel/connect-data-sources.md). \| -\| [Microsoft Intune](/intune/) \| Create a diagnostic setting to send logs to Azure Monitor. See [Send log data to storage, Event Hubs, or log analytics in Intune (preview)](/mem/intune/fundamentals/review-logs-using-azure-monitor). \| +\| [Microsoft Intune](/mem/intune/fundamentals/what-is-intune) \| Create a diagnostic setting to send logs to Azure Monitor. See [Send log data to storage, Event Hubs, or log analytics in Intune (preview)](/mem/intune/fundamentals/review-logs-using-azure-monitor). \| \| [ITSM](alerts/itsmc-overview.md) \| The [IT Service Management (ITSM) Connector](./alerts/itsmc-overview.md) allows you to connect Azure and a supported ITSM product/service. \| These are just a few options. There are many more third party companies that integrate with Azure and Azure Monitor at various levels. Use your favorite search engine to locate them.
azure-netapp-files	Azure Netapp Files Set Up Capacity Pool	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-netapp-files/azure-netapp-files-set-up-capacity-pool.md	Previously updated : 07/30/2024 Last updated : 08/08/2024 # Create a capacity pool for Azure NetApp Files Creating a capacity pool enables you to create volumes within it. :::image type="content" source="./media/shared/azure-netapp-files-new-capacity-pool.png" alt-text="Screenshot showing the New Capacity Pool window."::: - Azure NetApp Files double encryption at rest is currently in preview. If using this feature for the first time, you need to register the feature first. - - 1. Register the feature: - ```azurepowershell-interactive - Register-AzProviderFeature -ProviderNamespace Microsoft.NetApp -FeatureName ANFDoubleEncryption - ``` - 2. Check the status of the feature registration. `RegistrationState` may be in the `Registering` state for up to 60 minutes before changing to`Registered`. Wait until the status is `Registered` before continuing. - ```azurepowershell-interactive - Get-AzProviderFeature -ProviderNamespace Microsoft.NetApp -FeatureName ANFDoubleEncryption - ``` - You can also use [Azure CLI commands](/cli/azure/feature) `az feature register` and `az feature show` to register the feature and display the registration status. - 4. Select Create. The Capacity pools page shows the configurations for the capacity pool.
azure-netapp-files	Double Encryption At Rest	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-netapp-files/double-encryption-at-rest.md	Previously updated : 05/29/2024 Last updated : 08/08/2024 When data is transported over networks, additional encryption such as Transport Azure NetApp Files double encryption at rest provides two levels of encryption protection: both a hardware-based encryption layer (encrypted SSD drives) and a software-encryption layer. The hardware-based encryption layer resides at the physical storage level, using FIPS 140-2 certified drives. The software-based encryption layer is at the volume level completing the second level of encryption protection. -If you are using this feature for the first time, you need to [register for the feature](azure-netapp-files-set-up-capacity-pool.md#encryption_type) and then create a double-encryption capacity pool. For details, see [Create a capacity pool for Azure NetApp Files](azure-netapp-files-set-up-capacity-pool.md). +If you are using this feature for the first time, you need to create a double-encryption capacity pool. For details, see [Create a capacity pool for Azure NetApp Files](azure-netapp-files-set-up-capacity-pool.md). When you create a volume in a double-encryption capacity pool, the default key management (the Encryption key source field) is `Microsoft Managed Key`, and the other choice is `Customer Managed Key`. Using customer-managed keys requires additional preparation of an Azure Key Vault and other details. For more information about using volume encryption with customer managed keys, see [Configure customer-managed keys for Azure NetApp Files volume encryption](configure-customer-managed-keys.md). ->[!IMPORTANT] ->Double encryption in US Gov regions is only supported with platform-managed keys, not customer-managed keys. - :::image type="content" source="./media/double-encryption-at-rest/double-encryption-create-volume.png" alt-text="Screenshot of the Create Volume page in a double-encryption capacity pool." lightbox="./media/double-encryption-at-rest/double-encryption-create-volume.png"::: ## Supported regions
azure-netapp-files	Understand Guidelines Active Directory Domain Service Site	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-netapp-files/understand-guidelines-active-directory-domain-service-site.md	Ensure that you meet the following requirements about network topology and confi * Ensure that AD DS domain controllers have network connectivity from the Azure NetApp Files delegated subnet hosting the Azure NetApp Files volumes. * Peered virtual network topologies with AD DS domain controllers must have peering configured correctly to support Azure NetApp Files to AD DS domain controller network connectivity. * Network Security Groups (NSGs) and AD DS domain controller firewalls must have appropriately configured rules to support Azure NetApp Files connectivity to AD DS and DNS. -* For optimal experience, ensure the network latency is equal to or less than 10ms RTT between Azure NetApp Files and AD DS domain controllers. Any RTT higher than 10ms can lead to degraded application or user experience in latency-sensitive applications/environments. In case RTT is too high for desirable user experience, consider deploying replica domain controllers in your Azure NetApp Files environment. +* For optimal experience, ensure the network latency is equal to or less than 10ms RTT between Azure NetApp Files and AD DS domain controllers. Any RTT higher than 10ms can lead to degraded application or user experience in latency-sensitive applications/environments. In case RTT is too high for desirable user experience, consider deploying replica domain controllers in your Azure NetApp Files environment. Also see [Active Directory Domain Services considerations](#active-directory-domain-services-considerations). For more information on Microsoft Active Directory requirements for network latency over a WAN, see [Creating a Site Design](/windows-server/identity/ad-ds/plan/creating-a-site-design). You should use Active Directory Domain Services (AD DS) in the following scenari * You donΓÇÖt need Microsoft Entra Domain Services integration with a Microsoft Entra tenant in your subscription, or Microsoft Entra Domain Services is incompatible with your technical requirements. > [!NOTE] -> Azure NetApp Files doesn't support the use of AD DS Read-only Domain Controllers (RODC). +> Azure NetApp Files doesn't support the use of AD DS Read-only Domain Controllers (RODC). Writeable domain controllers are supported and are required for authentication with Azure NetApp Files volumes. For more information, see [Active Directory Replication Concepts](/windows-server/identity/ad-ds/get-started/replication/active-directory-replication-concepts). If you choose to use AD DS with Azure NetApp Files, follow the guidance in [Extend AD DS into Azure Architecture Guide](/azure/architecture/reference-architectures/identity/adds-extend-domain) and ensure that you meet the Azure NetApp Files [network](#network-requirements) and [DNS requirements](#ad-ds-requirements) for AD DS. If Azure NetApp Files is not able to reach any domain controllers assigned to th You must update the AD DS Site configuration whenever new domain controllers are deployed into a subnet assigned to the AD DS site that is used by the Azure NetApp Files AD Connection. Ensure that the DNS SRV records for the site reflect any changes to the domain controllers assigned to the AD DS Site used by Azure NetApp Files. You can check the validity of the DNS (SRV) resource record by using the `nslookup` utility. > [!NOTE] -> Azure NetApp Files doesn't support the use of AD DS Read-only Domain Controllers (RODC). To prevent Azure NetApp Files from using an RODC, do not configure the AD Site Name field of the AD connections with an RODC. +> Azure NetApp Files doesn't support the use of AD DS Read-only Domain Controllers (RODC). To prevent Azure NetApp Files from using an RODC, do not configure the AD Site Name field of the AD connections with an RODC. Writeable domain controllers are supported and are required for authentication with Azure NetApp Files volumes. For more information, see [Active Directory Replication Concepts](/windows-server/identity/ad-ds/get-started/replication/active-directory-replication-concepts). ### Sample AD DS site topology configuration for Azure NetApp Files
azure-netapp-files	Whats New	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-netapp-files/whats-new.md	Azure NetApp Files is updated regularly. This article provides a summary about the latest new features and enhancements. +## August 2024 + +* [Azure NetApp Files double encryption at rest](double-encryption-at-rest.md) is now generally available (GA). + ## July 2024 * Availability zone volume placement enhancement - [Populate existing volumes](manage-availability-zone-volume-placement.md#populate-an-existing-volume-with-availability-zone-information) is now generally available (GA).
azure-resource-manager	Bicep Core Diagnostics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/bicep-core-diagnostics.md	If you need more information about a particular diagnostic code, select the Fe \| BCP025 \| Error \| The property "{property}" is declared multiple times in this object. Remove or rename the duplicate properties. \| \| BCP026 \| Error \| The output expects a value of type "{expectedType}" but the provided value is of type "{actualType}". \| \| BCP028 \| Error \| Identifier "{identifier}" is declared multiple times. Remove or rename the duplicates. \| -\| BCP029 \| Error \| The resource type is not valid. Specify a valid resource type of format "<types>@<apiVersion>". \| +\| BCP029 \| Error \| The resource type is not valid. Specify a valid resource type of format "\<types>@\<apiVersion>". \| \| BCP030 \| Error \| The output type is not valid. Please specify one of the following types: {ToQuotedString(validTypes)}. \| \| BCP031 \| Error \| The parameter type is not valid. Please specify one of the following types: {ToQuotedString(validTypes)}. \| \| BCP032 \| Error \| The value must be a compile-time constant. \| -\| <a id='BCP033' />[BCP033](./diagnostics/bcp033.md) \| Error/Warning \| Expected a value of type <data-type> but the provided value is of type <data-type>. \| +\| <a id='BCP033' />[BCP033](./diagnostics/bcp033.md) \| Error/Warning \| Expected a value of type \<data-type> but the provided value is of type \<data-type>. \| \| BCP034 \| Error/Warning \| The enclosing array expected an item of type "{expectedType}", but the provided item was of type "{actualType}". \| -\| <a id='BCP035' />[BCP035](./diagnostics/bcp035.md) \| Error/Warning \| The specified <data-type> declaration is missing the following required properties: <property-name>. \| -\| <a id='BCP036' />[BCP036](./diagnostics/bcp036.md) \| Error/Warning \| The property <property-name> expected a value of type <data-type> but the provided value is of type <data-type>. \| -\| <a id='BCP037' />[BCP037](./diagnostics/bcp037.md) \| Error/Warning \| The property <property-name> is not allowed on objects of type <type-definition>. \| -\| <a id='BCP040' />[BCP040](./diagnostics/bcp040.md) \| Error/Warning \| String interpolation is not supported for keys on objects of type <type-definition>. \| +\| <a id='BCP035' />[BCP035](./diagnostics/bcp035.md) \| Error/Warning \| The specified \<data-type> declaration is missing the following required properties: \<property-name>. \| +\| <a id='BCP036' />[BCP036](./diagnostics/bcp036.md) \| Error/Warning \| The property \<property-name> expected a value of type \<data-type> but the provided value is of type \<data-type>. \| +\| <a id='BCP037' />[BCP037](./diagnostics/bcp037.md) \| Error/Warning \| The property \<property-name> is not allowed on objects of type \<type-definition>. \| +\| <a id='BCP040' />[BCP040](./diagnostics/bcp040.md) \| Error/Warning \| String interpolation is not supported for keys on objects of type \<type-definition>. \| \| BCP041 \| Error \| Values of type "{valueType}" cannot be assigned to a variable. \| \| BCP043 \| Error \| This is not a valid expression. \| \| BCP044 \| Error \| Cannot apply operator "{operatorName}" to operand of type "{type}". \| If you need more information about a particular diagnostic code, select the Fe \| BCP049 \| Error \| The array index must be of type "{LanguageConstants.String}" or "{LanguageConstants.Int}" but the provided index was of type "{wrongType}". \| \| BCP050 \| Error \| The specified path is empty. \| \| BCP051 \| Error \| The specified path begins with "/". Files must be referenced using relative paths. \| -\| BCP052 \| Error/Warning \| The type "{type}" does not contain property "{badProperty}". \| -\| BCP053 \| Error/Warning \| The type "{type}" does not contain property "{badProperty}". Available properties include {ToQuotedString(availableProperties)}. \| +\| <a id='BCP052' />[BCP052](./diagnostics/bcp052.md) \| Error/Warning \| The type \<type-name> does not contain property \<property-name>. \| +\| <a id='BCP053' />[BCP053](./diagnostics/bcp053.md) \| Error/Warning \| The type \<type-name> does not contain property \<property-name>. Available properties include \<property-names>. \| \| BCP054 \| Error \| The type "{type}" does not contain any properties. \| \| BCP055 \| Error \| Cannot access properties of type "{wrongType}". An "{LanguageConstants.Object}" type is required. \| \| BCP056 \| Error \| The reference to name "{name}" is ambiguous because it exists in namespaces {ToQuotedString(namespaces)}. The reference must be fully qualified. \| If you need more information about a particular diagnostic code, select the Fe \| BCP065 \| Error \| Function "{functionName}" is not valid at this location. It can only be used as a parameter default value. \| \| BCP066 \| Error \| Function "{functionName}" is not valid at this location. It can only be used in resource declarations. \| \| BCP067 \| Error \| Cannot call functions on type "{wrongType}". An "{LanguageConstants.Object}" type is required. \| -\| BCP068 \| Error \| Expected a resource type string. Specify a valid resource type of format "<types>@<apiVersion>". \| +\| BCP068 \| Error \| Expected a resource type string. Specify a valid resource type of format "\<types>@\<apiVersion>". \| \| BCP069 \| Error \| The function "{function}" is not supported. Use the "{@operator}" operator instead. \| \| BCP070 \| Error \| Argument of type "{argumentType}" is not assignable to parameter of type "{parameterType}". \| \| BCP071 \| Error \| Expected {expected}, but got {argumentCount}. \| \| <a id='BCP072' />[BCP072](./diagnostics/bcp072.md) \| Error \| This symbol cannot be referenced here. Only other parameters can be referenced in parameter default values. \| -\| <a id='BCP073' />[BCP073](./diagnostics/bcp073.md) \| Error/Warning \| The property <property-name> is read-only. Expressions cannot be assigned to read-only properties. \| +\| <a id='BCP073' />[BCP073](./diagnostics/bcp073.md) \| Error/Warning \| The property \<property-name> is read-only. Expressions cannot be assigned to read-only properties. \| \| BCP074 \| Error \| Indexing over arrays requires an index of type "{LanguageConstants.Int}" but the provided index was of type "{wrongType}". \| \| BCP075 \| Error \| Indexing over objects requires an index of type "{LanguageConstants.String}" but the provided index was of type "{wrongType}". \| \| BCP076 \| Error \| Cannot index over expression of type "{wrongType}". Arrays or objects are required. \| -\| BCP077 \| Error/Warning \| The property "{badProperty}" on type "{type}" is write-only. Write-only properties cannot be accessed. \| -\| BCP078 \| Error/Warning \| The property "{propertyName}" requires a value of type "{expectedType}", but none was supplied. \| +\| <a id='BCP077' />[BCP077](./diagnostics/bcp077.md) \| Error/Warning \| The property \<property-name> on type \<type-name> is write-only. Write-only properties cannot be accessed. \| +\| <a id='BCP078' />[BCP078](./diagnostics/bcp078.md) \| Error/Warning \| The property \<property-name> requires a value of type \<type-name>, but none was supplied. \| \| BCP079 \| Error \| This expression is referencing its own declaration, which is not allowed. \| \| BCP080 \| Error \| The expression is involved in a cycle ("{string.Join("\" -> \"", cycle)}"). \| \| BCP081 \| Warning \| Resource type "{resourceTypeReference.FormatName()}" does not have types available. Bicep is unable to validate resource properties prior to deployment, but this will not block the resource from being deployed. \| \| BCP082 \| Error \| The name "{name}" does not exist in the current context. Did you mean "{suggestedName}"? \| -\| BCP083 \| Error/Warning \| The type "{type}" does not contain property "{badProperty}". Did you mean "{suggestedProperty}"? \| +\| <a id='BCP083' />[BCP083](./diagnostics/bcp083.md) \| Error/Warning \| The type \<type-definition> does not contain property \<property-name>. Did you mean \<property-name>? \| \| BCP084 \| Error \| The symbolic name "{name}" is reserved. Please use a different symbolic name. Reserved namespaces are {ToQuotedString(namespaces.OrderBy(ns => ns))}. \| \| BCP085 \| Error \| The specified file path contains one ore more invalid path characters. The following are not permitted: {ToQuotedString(forbiddenChars.OrderBy(x => x).Select(x => x.ToString()))}. \| \| BCP086 \| Error \| The specified file path ends with an invalid character. The following are not permitted: {ToQuotedString(forbiddenPathTerminatorChars.OrderBy(x => x).Select(x => x.ToString()))}. \| \| BCP087 \| Error \| Array and object literals are not allowed here. \| -\| BCP088 \| Error/Warning \| The property "{property}" expected a value of type "{expectedType}" but the provided value is of type "{actualStringLiteral}". Did you mean "{suggestedStringLiteral}"? \| -\| BCP089 \| Error/Warning \| The property "{property}" is not allowed on objects of type "{type}". Did you mean "{suggestedProperty}"? \| +\| <a id='BCP088' />[BCP088](./diagnostics/bcp088.md) \| Error/Warning \| The property \<property-name> expected a value of type \<type-name> but the provided value is of type \<type-name>. Did you mean \<type-name>? \| +\| <a id='BCP089' />[BCP089](./diagnostics/bcp089.md) \| Error/Warning \| The property \<property-name> is not allowed on objects of type \<resource-type>. Did you mean \<property-name>? \| \| BCP090 \| Error \| This module declaration is missing a file path reference. \| \| BCP091 \| Error \| An error occurred reading file. {failureMessage} \| \| BCP092 \| Error \| String interpolation is not supported in file paths. \| If you need more information about a particular diagnostic code, select the Fe \| BCP110 \| Error \| The type "{type}" does not contain function "{name}". Did you mean "{suggestedName}"? \| \| BCP111 \| Error \| The specified file path contains invalid control code characters. \| \| BCP112 \| Error \| The "{LanguageConstants.TargetScopeKeyword}" cannot be declared multiple times in one file. \| -\| BCP113 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeTenant}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include tenant: tenant(), named management group: managementGroup(<name>), named subscription: subscription(<subId>), or named resource group in a named subscription: resourceGroup(<subId>, <name>). \| -\| BCP114 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeManagementGroup}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include current management group: managementGroup(), named management group: managementGroup(<name>), named subscription: subscription(<subId>), tenant: tenant(), or named resource group in a named subscription: resourceGroup(<subId>, <name>). \| -\| BCP115 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeSubscription}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include current subscription: subscription(), named subscription: subscription(<subId>), named resource group in same subscription: resourceGroup(<name>), named resource group in different subscription: resourceGroup(<subId>, <name>), or tenant: tenant(). \| -\| BCP116 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeResourceGroup}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include current resource group: resourceGroup(), named resource group in same subscription: resourceGroup(<name>), named resource group in a different subscription: resourceGroup(<subId>, <name>), current subscription: subscription(), named subscription: subscription(<subId>) or tenant: tenant(). \| +\| BCP113 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeTenant}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include tenant: tenant(), named management group: managementGroup(\<name>), named subscription: subscription(\<subId>), or named resource group in a named subscription: resourceGroup(\<subId>, \<name>). \| +\| BCP114 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeManagementGroup}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include current management group: managementGroup(), named management group: managementGroup(\<name>), named subscription: subscription(\<subId>), tenant: tenant(), or named resource group in a named subscription: resourceGroup(\<subId>, \<name>). \| +\| BCP115 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeSubscription}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include current subscription: subscription(), named subscription: subscription(\<subId>), named resource group in same subscription: resourceGroup(\<name>), named resource group in different subscription: resourceGroup(\<subId>, \<name>), or tenant: tenant(). \| +\| BCP116 \| Warning \| Unsupported scope for module deployment in a "{LanguageConstants.TargetScopeTypeResourceGroup}" target scope. Omit this property to inherit the current scope, or specify a valid scope. Permissible scopes include current resource group: resourceGroup(), named resource group in same subscription: resourceGroup(\<name>), named resource group in a different subscription: resourceGroup(\<subId>, \<name>), current subscription: subscription(), named subscription: subscription(\<subId>) or tenant: tenant(). \| \| BCP117 \| Error \| An empty indexer is not allowed. Specify a valid expression. \| \| BCP118 \| Error \| Expected the "{" character, the "[" character, or the "if" keyword at this location. \| \| BCP119 \| Warning \| Unsupported scope for extension resource deployment. Expected a resource reference. \| If you need more information about a particular diagnostic code, select the Fe \| BCP153 \| Error \| Expected a resource or module declaration after the decorator. \| \| BCP154 \| Error \| Expected a batch size of at least {limit} but the specified value was "{value}". \| \| BCP155 \| Error \| The decorator "{decoratorName}" can only be attached to resource or module collections. \| -\| BCP156 \| Error \| The resource type segment "{typeSegment}" is invalid. Nested resources must specify a single type segment, and optionally can specify an API version using the format "<type>@<apiVersion>". \| +\| BCP156 \| Error \| The resource type segment "{typeSegment}" is invalid. Nested resources must specify a single type segment, and optionally can specify an API version using the format "\<type>@\<apiVersion>". \| \| BCP157 \| Error \| The resource type cannot be determined due to an error in the containing resource. \| \| BCP158 \| Error \| Cannot access nested resources of type "{wrongType}". A resource type is required. \| \| BCP159 \| Error \| The resource "{resourceName}" does not contain a nested resource named "{identifierName}". Known nested resources are: {ToQuotedString(nestedResourceNames)}. \| If you need more information about a particular diagnostic code, select the Fe \| BCP190 \| Error \| The artifact with reference "{artifactRef}" has not been restored. \| \| BCP191 \| Error \| Unable to restore the artifact with reference "{artifactRef}". \| \| BCP192 \| Error \| Unable to restore the artifact with reference "{artifactRef}": {message} \| -\| BCP193 \| Error \| {BuildInvalidOciArtifactReferenceClause(aliasName, badRef)} Specify a reference in the format of "{ArtifactReferenceSchemes.Oci}:<artifact-uri>:<tag>", or "{ArtifactReferenceSchemes.Oci}/<module-alias>:<module-name-or-path>:<tag>". \| -\| BCP194 \| Error \| {BuildInvalidTemplateSpecReferenceClause(aliasName, badRef)} Specify a reference in the format of "{ArtifactReferenceSchemes.TemplateSpecs}:<subscription-ID>/<resource-group-name>/<template-spec-name>:<version>", or "{ArtifactReferenceSchemes.TemplateSpecs}/<module-alias>:<template-spec-name>:<version>". \| +\| BCP193 \| Error \| {BuildInvalidOciArtifactReferenceClause(aliasName, badRef)} Specify a reference in the format of "{ArtifactReferenceSchemes.Oci}:\<artifact-uri>:\<tag>", or "{ArtifactReferenceSchemes.Oci}/\<module-alias>:\<module-name-or-path>:\<tag>". \| +\| BCP194 \| Error \| {BuildInvalidTemplateSpecReferenceClause(aliasName, badRef)} Specify a reference in the format of "{ArtifactReferenceSchemes.TemplateSpecs}:\<subscription-ID>/\<resource-group-name>/\<template-spec-name>:\<version>", or "{ArtifactReferenceSchemes.TemplateSpecs}/\<module-alias>:\<template-spec-name>:\<version>". \| \| BCP195 \| Error \| {BuildInvalidOciArtifactReferenceClause(aliasName, badRef)} The artifact path segment "{badSegment}" is not valid. Each artifact name path segment must be a lowercase alphanumeric string optionally separated by a ".", "_", or \"-\"." \| \| BCP196 \| Error \| The module tag or digest is missing. \| \| BCP197 \| Error \| The tag "{badTag}" exceeds the maximum length of {maxLength} characters. \| \| BCP198 \| Error \| The tag "{badTag}" is not valid. Valid characters are alphanumeric, ".", "_", or "-" but the tag cannot begin with ".", "_", or "-". \| \| BCP199 \| Error \| Module path "{badRepository}" exceeds the maximum length of {maxLength} characters. \| \| BCP200 \| Error \| The registry "{badRegistry}" exceeds the maximum length of {maxLength} characters. \| -\| BCP201 \| Error \| Expected a provider specification string of with a valid format at this location. Valid formats are "br:<providerRegistryHost>/<providerRepositoryPath>@<providerVersion>" or "br/<providerAlias>:<providerName>@<providerVersion>". \| +\| BCP201 \| Error \| Expected a provider specification string of with a valid format at this location. Valid formats are "br:\<providerRegistryHost>/\<providerRepositoryPath>@\<providerVersion>" or "br/\<providerAlias>:\<providerName>@\<providerVersion>". \| \| BCP202 \| Error \| Expected a provider alias name at this location. \| \| BCP203 \| Error \| Using provider statements requires enabling EXPERIMENTAL feature "Extensibility". \| \| BCP204 \| Error \| Provider namespace "{identifier}" is not recognized. \| If you need more information about a particular diagnostic code, select the Fe \| BCP301 \| Error \| The type name "{reservedName}" is reserved and may not be attached to a user-defined type. \| \| BCP302 \| Error \| The name "{name}" is not a valid type. Please specify one of the following types: {ToQuotedString(validTypes)}. \| \| BCP303 \| Error \| String interpolation is unsupported for specifying the provider. \| -\| BCP304 \| Error \| Invalid provider specifier string. Specify a valid provider of format "<providerName>@<providerVersion>". \| +\| BCP304 \| Error \| Invalid provider specifier string. Specify a valid provider of format "\<providerName>@\<providerVersion>". \| \| BCP305 \| Error \| Expected the "with" keyword, "as" keyword, or a new line character at this location. \| \| BCP306 \| Error \| The name "{name}" refers to a namespace, not to a type. \| \| BCP307 \| Error \| The expression cannot be evaluated, because the identifier properties of the referenced existing resource including {ToQuotedString(runtimePropertyNames.OrderBy(x => x))} cannot be calculated at the start of the deployment. In this situation, {accessiblePropertyNamesClause}{accessibleFunctionNamesClause}. \| If you need more information about a particular diagnostic code, select the Fe \| BCP323 \| Error \| The `[?]` (safe dereference) operator may not be used on resource or module collections. \| \| BCP325 \| Error \| Expected a type identifier at this location. \| \| BCP326 \| Error \| Nullable-typed parameters may not be assigned default values. They have an implicit default of 'null' that cannot be overridden. \| -\| <a id='BCP327' />[BCP327](./diagnostics/bcp327.md) \| Error/Warning \| The provided value (which will always be greater than or equal to <value>) is too large to assign to a target for which the maximum allowable value is <max-value>. \| -\| <a id='BCP328' />[BCP328](./diagnostics/bcp328.md) \| Error/Warning \| The provided value (which will always be less than or equal to <value>) is too small to assign to a target for which the minimum allowable value is <max-value>. \| +\| <a id='BCP327' />[BCP327](./diagnostics/bcp327.md) \| Error/Warning \| The provided value (which will always be greater than or equal to \<value>) is too large to assign to a target for which the maximum allowable value is \<max-value>. \| +\| <a id='BCP328' />[BCP328](./diagnostics/bcp328.md) \| Error/Warning \| The provided value (which will always be less than or equal to \<value>) is too small to assign to a target for which the minimum allowable value is \<max-value>. \| \| BCP329 \| Warning \| The provided value can be as small as {sourceMin} and may be too small to assign to a target with a configured minimum of {targetMin}. \| \| BCP330 \| Warning \| The provided value can be as large as {sourceMax} and may be too large to assign to a target with a configured maximum of {targetMax}. \| \| BCP331 \| Error \| A type's "{minDecoratorName}" must be less than or equal to its "{maxDecoratorName}", but a minimum of {minValue} and a maximum of {maxValue} were specified. \| -\| <a id='BCP332' />[BCP332](./diagnostics/bcp332.md) \| Error/Warning \| The provided value (whose length will always be greater than or equal to <string-length>) is too long to assign to a target for which the maximum allowable length is <max-length>. \| -\| <a id='BCP333' />[BCP333](./diagnostics/bcp333.md) \| Error/Warning \| The provided value (whose length will always be less than or equal to <string-length>) is too short to assign to a target for which the minimum allowable length is <min-length>. \| +\| <a id='BCP332' />[BCP332](./diagnostics/bcp332.md) \| Error/Warning \| The provided value (whose length will always be greater than or equal to \<string-length>) is too long to assign to a target for which the maximum allowable length is \<max-length>. \| +\| <a id='BCP333' />[BCP333](./diagnostics/bcp333.md) \| Error/Warning \| The provided value (whose length will always be less than or equal to \<string-length>) is too short to assign to a target for which the minimum allowable length is \<min-length>. \| \| BCP334 \| Warning \| The provided value can have a length as small as {sourceMinLength} and may be too short to assign to a target with a configured minimum length of {targetMinLength}. \| \| BCP335 \| Warning \| The provided value can have a length as large as {sourceMaxLength} and may be too long to assign to a target with a configured maximum length of {targetMaxLength}. \| \| BCP337 \| Error \| This declaration type is not valid for a Bicep Parameters file. Specify a "{LanguageConstants.UsingKeyword}", "{LanguageConstants.ParameterKeyword}" or "{LanguageConstants.VariableKeyword}" declaration. \| If you need more information about a particular diagnostic code, select the Fe \| BCP372 \| Error \| The "@export()" decorator may not be applied to variables that refer to parameters, modules, or resource, either directly or indirectly. The target of this decorator contains direct or transitive references to the following unexportable symbols: {ToQuotedString(nonExportableSymbols)}. \| \| BCP373 \| Error \| Unable to import the symbol named "{name}": {message} \| \| BCP374 \| Error \| The imported model cannot be loaded with a wildcard because it contains the following duplicated exports: {ToQuotedString(ambiguousExportNames)}. \| -\| BCP375 \| Error \| An import list item that identifies its target with a quoted string must include an 'as <alias>' clause. \| +\| BCP375 \| Error \| An import list item that identifies its target with a quoted string must include an 'as \<alias>' clause. \| \| BCP376 \| Error \| The "{name}" symbol cannot be imported because imports of kind {exportMetadataKind} are not supported in files of kind {sourceFileKind}. \| \| BCP377 \| Error \| The provider alias name "{aliasName}" is invalid. Valid characters are alphanumeric, "_", or "-". \| \| BCP378 \| Error \| The OCI artifact provider alias "{aliasName}" in the {BuildBicepConfigurationClause(configFileUri)} is invalid. The "registry" property cannot be null or undefined. \| If you need more information about a particular diagnostic code, select the Fe \| BCP392 \| Warning \| "The supplied resource type identifier "{resourceTypeIdentifier}" was not recognized as a valid resource type name." \| \| BCP393 \| Warning \| "The type pointer segment "{unrecognizedSegment}" was not recognized. Supported pointer segments are: "properties", "items", "prefixItems", and "additionalProperties"." \| \| BCP394 \| Error \| Resource-derived type expressions must derefence a property within the resource body. Using the entire resource body type is not permitted. \| -\| BCP395 \| Error \| Declaring provider namespaces using the '<providerName>@<version>' expression has been deprecated. Please use an identifier instead. \| +\| BCP395 \| Error \| Declaring provider namespaces using the '\<providerName>@\<version>' expression has been deprecated. Please use an identifier instead. \| \| BCP396 \| Error \| The referenced provider types artifact has been published with malformed content. \| \| BCP397 \| Error \| "Provider {name} is incorrectly configured in the {BuildBicepConfigurationClause(configFileUri)}. It is referenced in the "{RootConfiguration.ImplicitProvidersConfigurationKey}" section, but is missing corresponding configuration in the "{RootConfiguration.ProvidersConfigurationKey}" section." \| \| BCP398 \| Error \| "Provider {name} is incorrectly configured in the {BuildBicepConfigurationClause(configFileUri)}. It is configured as built-in in the "{RootConfiguration.ProvidersConfigurationKey}" section, but no built-in provider exists." \|
azure-resource-manager	Bcp035	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp035.md	Add the missing property to the resource definition. ## Examples -The following example raises the warning for virtualNetworkGateway1 and virtualNetworkGateway2: +The following example raises the warning for _virtualNetworkGateway1_ and _virtualNetworkGateway2_: ```bicep var networkConnectionName = 'testConnection' resource networkConnection 'Microsoft.Network/connections@2023-11-01' = { } ``` +The following example raises the error for _outValue_ because the required property _value_ is missing: + +```bicep +@discriminator('type') +type taggedUnion = {type: 'foo', value: int} \| {type: 'bar', value: bool} + +output outValue taggedUnion = {type: 'foo'} +``` + +You can fix the issue by adding the missing properties: + +```bicep +@discriminator('type') +type taggedUnion = {type: 'foo', value: int} \| {type: 'bar', value: bool} + +output outValue taggedUnion = {type: 'foo', value: 3} +``` + ## Next steps For more information about Bicep error and warning codes, see [Bicep core diagnostics](../bicep-core-diagnostics.md).
azure-resource-manager	Bcp052	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp052.md	+ + Title: BCP052 +description: Error/warning - The type <type-name> does not contain property <property-name>. ++ Last updated : 08/06/2024++ +# Bicep error/warning code - BCP052 + +This error/warning occurs when you reference a property that isn't defined in the [data type](../data-types.md) or the [user-defined data type](../user-defined-data-types.md). + +## Error/warning description + +`The type <type-name> does not contain property <property-name>.` + +## Examples + +The following example raises the error _object_ doesn't contain a property called _bar_: + +```bicep +type foo = object.bar +``` + +You can fix the error by removing the property: + +```bicep +type foo = object +``` + +## Next steps + +For more information about Bicep error and warning codes, see [Bicep warnings and errors](../bicep-core-diagnostics.md).
azure-resource-manager	Bcp053	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp053.md	Title: BCP053 -description: Error/warning - The type <resource-type> does not contain property <property-name>. Available properties include <property-names>. +description: Error/warning - The type <type-name> does not contain property <property-name>. Available properties include <property-names>. Last updated 07/15/2024 This error/warning occurs when you reference a property that isn't defined in th ## Error/warning description -`The type <resource-type> does not contain property <property-name>. Available properties include <property-names>.` +`The type <type-name> does not contain property <property-name>. Available properties include <property-names>.` ## Solution
azure-resource-manager	Bcp077	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp077.md	+ + Title: BCP077 +description: Error/warning - The property <property-name> on type <type-name> is write-only. Write-only properties cannot be accessed. ++ Last updated : 08/06/2024++ +# Bicep error/warning code - BCP077 + +This error/warning occurs when you reference a property that is write-only. + +## Error/warning description + +`The property <property-name> on type <type-name> is write-only. Write-only properties cannot be accessed.` + +## Examples + +The following example raises the warning because `customHeaders` is a write-only property. + +```bicep +resource webhook 'Microsoft.ContainerRegistry/registries/webhooks@2023-07-01' existing = { + name: 'registry/webhook' +} + +output customerHeaders object = webhook.properties.customHeaders +``` + +## Next steps + +For more information about Bicep error and warning codes, see [Bicep warnings and errors](../bicep-core-diagnostics.md).
azure-resource-manager	Bcp078	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp078.md	+ + Title: BCP078 +description: Error/warning - The property <property-name> requires a value of type <type-name>, but none was supplied. ++ Last updated : 08/06/2024++ +# Bicep error/warning code - BCP078 + +This error/warning occurs when you reference a [custom-tagged union data type](../data-types.md#custom-tagged-union-data-type), but the required value isn't provided. + +## Error/warning description + +`The property <property-name> requires a value of type <type-name>, but none was supplied.` + +## Examples + +The following example raises the error because the property _type_ with the value of _`foo`_ or `_bar_` isn't provided. + +```bicep +@discriminator('type') +type taggedUnion = {type: 'foo', value: int} \| {type: 'bar', value: bool} + +output outValue taggedUnion = {} +``` + +You can fix the error by including the properties: + +```bicep +@discriminator('type') +type taggedUnion = {type: 'foo', value: int} \| {type: 'bar', value: bool} + +output outValue taggedUnion = {type: 'foo', value: 3} +``` + +If the property _value_ isn't provided in the preceding example, you get [BCP035](./bcp035.md). + +## Next steps + +For more information about Bicep error and warning codes, see [Bicep warnings and errors](../bicep-core-diagnostics.md).
azure-resource-manager	Bcp083	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp083.md	+ + Title: BCP083 +description: Error/warning - The type <type-definition> does not contain property <property-name>. Did you mean <property-name>? ++ Last updated : 08/06/2024++ +# Bicep error/warning code - BCP083 + +This error/warning occurs when you reference a property of a type that appears to be a typo. + +## Error/warning description + +`The type <type-definition> does not contain property <property-name>. Did you mean <property-name>?` + +## Examples + +The following example raises the error because _foo.type1_ looks like a typo. + +```bicep +type foo = { + type: string +} + +type bar = foo.type1 +``` + +You can fix the error by correcting the typo: + +```bicep +type foo = { + type: string +} + +type bar = foo.type +``` + +## Next steps + +For more information about Bicep error and warning codes, see [Bicep warnings and errors](../bicep-core-diagnostics.md).
azure-resource-manager	Bcp088	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp088.md	+ + Title: BCP088 +description: Error/warning - The property <property-name> expected a value of type <type-name> but the provided value is of type <type-name>. Did you mean <type-name>? ++ Last updated : 08/06/2024++ +# Bicep error/warning code - BCP088 + +This error/warning occurs when a value of a property seems to be a typo. + +## Error/warning description + +`The property <property-name> expected a value of type <type-name> but the provided value is of type <type-name>. Did you mean <type-name>?` + +## Examples + +The following example raises the error because _name_ value _ad_ looks like a typo. + +```bicep +resource kv 'Microsoft.KeyVault/vaults@2023-07-01' existing = { + name: 'vault' + + resource ap 'accessPolicies' = { + name: 'ad' + properties: { + accessPolicies: [] + } + } +} +``` + +You can fix the error by correcting the typo: + +```bicep +resource kv 'Microsoft.KeyVault/vaults@2023-07-01' existing = { + name: 'vault' + + resource ap 'accessPolicies' = { + name: 'add' + properties: { + accessPolicies: [] + } + } +} +``` + +## Next steps + +For more information about Bicep error and warning codes, see [Bicep warnings and errors](../bicep-core-diagnostics.md).
azure-resource-manager	Bcp089	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/diagnostics/bcp089.md	+ + Title: BCP089 +description: Error/warning - The property <property-name> isn't allowed on objects of type <resource-type>. Did you mean <property-name>? ++ Last updated : 08/06/2024++ +# Bicep error/warning code - BCP089 + +This error/warning occurs when a property name seems to be a typo. + +## Error/warning description + +`The property <property-name> is not allowed on objects of type <resource-type>. Did you mean <property-name>?` + +## Examples + +The following example raises the error because the property name _named_ looks like a typo. + +```bicep +resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' existing = { + named: 'account' +} +``` + +You can fix the error by correcting the typo: + +```bicep +resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' existing = { + name: 'account' +} +``` + +## Next steps + +For more information about Bicep error and warning codes, see [Bicep warnings and errors](../bicep-core-diagnostics.md).
azure-resource-manager	Operator Spread	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/operator-spread.md	Title: Bicep spread operator description: Describes Bicep spread operator. Previously updated : 05/21/2024 Last updated : 08/07/2024 # Bicep spread operator The spread operator is used to copy properties from one object to another or to The following example shows the spread operator used in an object: ```bicep -var objA = { bar: 'bar' } -output objB object = { foo: 'foo', ...objA } +var objA = { color: 'white' } +output objB object = { shape: 'circle', ...objA } ``` Output from the example: \| Name \| Type \| Value \| \|\|\|-\| -\| `objB` \| object \| { foo: 'foo', bar: 'bar' } \| +\| `objB` \| object \| { shape: 'circle', color: 'white' } \| The following example shows the spread operator used in an array: Output from the example: The following example shows spread used in a multi-line operation: ```bicep -var objA = { foo: 'foo' } -var objB = { bar: 'bar' } -output combined object = { - ...objA +var objA = { color: 'white' } +var objB = { shape: 'circle'} +output objCombined object = { + ...objA ...objB } ``` In this usage, comma isn't used between the two lines. Output from the example: \| Name \| Type \| Value \| \|\|\|-\| -\| `combined` \| object \| { foo: 'foo', bar: 'bar' } \| +\| `objCombined` \| object \| { color: 'white', shape: 'circle' } \| -The spread operation can be used to avoid setting an optional property. For example: +The spread operation can be used to avoid setting an optional property. In the following example, _accessTier_ is set only if the parameter _tier_ isn't an empty string. ```bicep -param vmImageResourceId string = '' - -var bar = vmImageResourceId != '' ? { - id: vmImageResourceId -} : {} - -output foo object = { - ...bar - alwaysSet: 'value' +param location string = resourceGroup().location +param tier string = 'Hot' + +var storageAccountName = uniqueString(resourceGroup().id) +var accessTier = tier != '' ? {accessTier: tier} : {} + +resource mystorage 'Microsoft.Storage/storageAccounts@2023-05-01' = { + name: storageAccountName + location: location + sku: { + name: 'Standard_LRS' + } + kind: 'StorageV2' + properties: { + ...accessTier + } } ``` -Output from the example: +The preceding example can also be written as: -\| Name \| Type \| Value \| -\|\|\|-\| -\| `foo` \| object \| {"alwaysSet":"value"} \| +```bicep +param location string = resourceGroup().location +param tier string = 'Hot' + +var storageAccountName = uniqueString(resourceGroup().id) + +resource mystorage 'Microsoft.Storage/storageAccounts@2023-05-01' = { + name: storageAccountName + location: location + sku: { + name: 'Standard_LRS' + } + kind: 'StorageV2' + properties: { + ...(tier != '' ? {accesssTier: tier} : {}) + } +} +``` -The preceding example can also be written as: +The spread operator can be used to override existing properties. ```bicep -param vmImageResourceId string = '' +param location string = resourceGroup().location +param storageProperties { + accessTier: string? +} -output foo object = { - ...(vmImageResourceId != '' ? { - id: vmImageResourceId - } : {}) - alwaysSet: 'value' +resource mystorage 'Microsoft.Storage/storageAccounts@2023-05-01' = { + name: uniqueString(resourceGroup().id) + location: location + sku: { + name: 'Standard_LRS' + } + kind: 'StorageV2' + properties: { + accessTier: 'Cold' + ...storageProperties + } } ```
azure-resource-manager	Quickstart Create Bicep Use Visual Studio Code	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/quickstart-create-bicep-use-visual-studio-code.md	Title: Create Bicep files - Visual Studio Code description: Use Visual Studio Code and the Bicep extension to Bicep files for deploy Azure resources Previously updated : 03/20/2024 Last updated : 08/05/2024 #Customer intent: As a developer new to Azure deployment, I want to learn how to use Visual Studio Code to create and edit Bicep files, so I can use them to deploy Azure resources. You're almost done. Just provide values for those properties. Again, intellisense helps you. Set `name` to `storageAccountName`, which is the parameter that contains a name for the storage account. For `location`, set it to `location`, which is a parameter you created earlier. When adding `sku.name` and `kind`, intellisense presents the valid options. +To add optional properties alongside the required properties, place the cursor at the desired location and press <kbd>Ctrl</kbd>+<kbd>Space</kbd>. Intellisense suggests unused properties as shown in the following screenshot. ++ When finished, you have: ```bicep
azure-resource-manager	User Defined Data Types	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-resource-manager/bicep/user-defined-data-types.md	resource storageAccount 'Microsoft.Storage/storageAccounts@2023-04-01' = { ## Tagged union data type -To declare a custom tagged union data type within a Bicep file, you can place a discriminator decorator above a user-defined type declaration. [Bicep CLI version 0.21.X or higher](./install.md) is required to use this decorator. The following example shows how to declare a tagged union data type: +To declare a custom tagged union data type within a Bicep file, you can place a `discriminator` decorator above a user-defined type declaration. [Bicep CLI version 0.21.X or higher](./install.md) is required to use this decorator. The following example shows how to declare a tagged union data type: ```bicep type FooConfig = {
azure-vmware	Deploy Zerto Disaster Recovery	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-vmware/deploy-zerto-disaster-recovery.md	# Deploy Zerto disaster recovery on Azure VMware Solution -> [!IMPORTANT] -> Temporary pause on new onboarding for Zerto on Azure VMware Solution -> -> Due to ongoing security enhancements and ongoing development work on the Linux version for Azure VMware Solution Run Command and migration activities, we are currently not onboarding new customers for Zerto on Azure VMware Solution. These efforts include transitioning to Linux-based run command, meeting the security requirements to operate the Zerto Linux appliance, and migrating existing customers to latest Zerto version. This pause will be in effect until August 6, 2024. -> ->Please Note: Existing customers will continue to receive full support as usual. For further information regarding the timeline and future onboarding availability, please reach out to your Zerto account team. -> ->Thank you for your understanding and cooperation. -- > [!IMPORTANT] > AV64 node type does not support Zerto Disaster Recovery at the moment. You can contact your Zerto account team to get more information and an estimate of when this will be available. In this scenario, the primary site is an Azure VMware Solution private cloud in - Network connectivity, ExpressRoute based, from Azure VMware Solution to the virtual network used for disaster recovery. -- Follow the [Zerto Virtual Replication Azure Quickstart Guide](https://help.zerto.com/bundle/QS.Azure.HTML.95/page/Zerto_Quick_Start_Azure_Environments.htm) for the rest of the prerequisites. +- Follow the [Zerto Virtual Replication Azure Quickstart Guide](https://help.zerto.com/bundle/Install.MA.HTML.10.0_U1/page/Prerequisites_Requirements_Microsoft_Azure_Environments.htm) for the rest of the prerequisites. ## Install Zerto on Azure VMware Solution
backup	Backup Azure Sql Database	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/backup/backup-azure-sql-database.md	Title: Back up SQL Server databases to Azure description: This article explains how to back up SQL Server to Azure. The article also explains SQL Server recovery. Previously updated : 09/21/2023 Last updated : 08/08/2024 [Azure Backup](backup-overview.md) offers a stream-based, specialized solution to back up SQL Server running in Azure VMs. This solution aligns with Azure Backup's benefits of zero-infrastructure backup, long-term retention, and central management. It additionally provides the following advantages specifically for SQL Server: -1. Workload aware backups that support all backup types - full, differential, and log -2. 15 minute RPO (recovery point objective) with frequent log backups -3. Point-in-time recovery up to a second -4. Individual database level back up and restore - ->[!Note] ->Snapshot-based backup for SQL databases in Azure VM is now in preview. This unique offering combines the goodness of snapshots, leading to a better RTO and low impact on the server along with the benefits of frequent log backups for low RPO. For any queries/access, write to us at [AskAzureBackupTeam@microsoft.com](mailto:AskAzureBackupTeam@microsoft.com). +- Workload aware backups that support all backup types - full, differential, and log +- 15 minute RPO (recovery point objective) with frequent log backups +- Point-in-time recovery up to a second +- Individual database level back up and restore To view the backup and restore scenarios that we support today, see the [support matrix](sql-support-matrix.md#scenario-support).
batch	Batch Rendering Storage Data Movement	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/batch/batch-rendering-storage-data-movement.md	There are multiple options for making the scene and asset files available to the * This option has the advantage that it is very cost-effective, as no VMs are required for the file system, plus blobfuse caching on the VMs avoids repeated downloads of the same files for multiple jobs and tasks. Data movement is also simple as the files are simply blobs and standard APIs and tools, such as azcopy, can be used to copy file between an on-premises file system and Azure storage. * File system or file share: * Depending on VM operating system and performance/scale requirements, then options include [Azure Files](../storage/files/storage-files-introduction.md), using a VM with attached disks for NFS, using multiple VMs with attached disks for a distributed file system like GlusterFS, or using a third-party offering. - * [Avere Systems](https://www.averesystems.com/) is now part of Microsoft and will have solutions in the near future that are ideal for large-scale, high-performance rendering. The Avere solution will enable an Azure-based NFS or SMB cache to be created that works in conjunction with blob storage or with on-premises NAS devices. + * Avere Systems is now part of Microsoft and will have solutions in the near future that are ideal for large-scale, high-performance rendering. The Avere solution will enable an Azure-based NFS or SMB cache to be created that works in conjunction with blob storage or with on-premises NAS devices. * With a file system, files can be read or written directly to the file system or can be copied between file system and the pool VMs. * A shared file system allows a large number of assets shared between projects and jobs to be utilized, with rendering tasks only accessing what is required.
batch	Nodes And Pools	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/batch/nodes-and-pools.md	Title: Nodes and pools in Azure Batch description: Learn about compute nodes and pools and how they are used in an Azure Batch workflow from a development standpoint. Previously updated : 06/25/2024 Last updated : 08/08/2024 # Nodes and pools in Azure Batch When you create a Batch pool, you specify the Azure virtual machine configuratio The Virtual Machine Configuration specifies that the pool is composed of Azure virtual machines. These VMs may be created from either Linux or Windows images. -> [!IMPORTANT] -> Currently, Batch does not support [Trusted Launch VMs](../virtual-machines/trusted-launch.md). - The [Batch node agent](https://github.com/Azure/Batch/blob/master/changelogs/nodeagent/CHANGELOG.md) is a program that runs on each node in the pool and provides the command-and-control interface between the node and the Batch service. There are different implementations of the node agent, known as SKUs, for different operating systems. When you create a pool based on the Virtual Machine Configuration, you must specify not only the size of the nodes and the source of the images used to create them, but also the virtual machine image reference and the Batch node agent SKU to be installed on the nodes. For more information about specifying these pool properties, see [Provision Linux compute nodes in Azure Batch pools](batch-linux-nodes.md). You can optionally attach one or more empty data disks to pool VMs created from Marketplace images, or include data disks in custom images used to create the VMs. When including data disks, you need to mount and format the disks from within a VM to use them. ### Node Agent SKUs
batch	Security Best Practices	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/batch/security-best-practices.md	Title: Batch security and compliance best practices description: Learn best practices and useful tips for enhancing security with your Azure Batch solutions. Previously updated : 06/27/2024 Last updated : 08/08/2024 outbound access for baseline operation, the recommendation is to use the simplif node communication model will be [retired on March 31, 2026](batch-pools-to-simplified-compute-node-communication-model-migration-guide.md). +Pools should also be configured with enhanced security settings, including +[Trusted Launch](../virtual-machines/trusted-launch.md) (requires Gen2 VM images and a compatible VM size), +enabling secure boot, vTPM, and encryption at host (requires a compatible VM size). + ### Batch account authentication Batch account access supports two methods of authentication: Shared Key and [Microsoft Entra ID](batch-aad-auth.md).
chaos-studio	Chaos Studio Tutorial Agent Based Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/chaos-studio/chaos-studio-tutorial-agent-based-cli.md	The chaos agent is an application that runs in your VM or virtual machine scale Windows ```azurecli-interactive - az vm extension set --ids $VM_RESOURCE_ID --name ChaosWindowsAgent --publisher Microsoft.Azure.Chaos --version 1.1 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY"{"Overrides": "CheckCertRevocation" = true}}' + az vm extension set --ids $VM_RESOURCE_ID --name ChaosWindowsAgent --publisher Microsoft.Azure.Chaos --version 1.1 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY","Overrides":{"CheckCertRevocation":true}}' ``` Linux ```azurecli-interactive - az vm extension set --ids $VM_RESOURCE_ID --name ChaosLinuxAgent --publisher Microsoft.Azure.Chaos --version 1.0 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY"{"Overrides": "CheckCertRevocation" = true}}' + az vm extension set --ids $VM_RESOURCE_ID --name ChaosLinuxAgent --publisher Microsoft.Azure.Chaos --version 1.0 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY","Overrides":{"CheckCertRevocation":true}}' ``` #### Install the agent on a virtual machine scale set The chaos agent is an application that runs in your VM or virtual machine scale Windows ```azurecli-interactive - az vmss extension set --subscription $SUBSCRIPTION_ID --resource-group $RESOURCE_GROUP --vmss-name $VMSS_NAME --name ChaosWindowsAgent --publisher Microsoft.Azure.Chaos --version 1.1 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY"{"Overrides": "CheckCertRevocation" = true}}' + az vmss extension set --subscription $SUBSCRIPTION_ID --resource-group $RESOURCE_GROUP --vmss-name $VMSS_NAME --name ChaosWindowsAgent --publisher Microsoft.Azure.Chaos --version 1.1 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY","Overrides":{"CheckCertRevocation":true}}' ``` Linux ```azurecli-interactive - az vmss extension set --subscription $SUBSCRIPTION_ID --resource-group $RESOURCE_GROUP --vmss-name $VMSS_NAME --name ChaosLinuxAgent --publisher Microsoft.Azure.Chaos --version 1.0 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY"{"Overrides": "CheckCertRevocation" = true}}' + az vmss extension set --subscription $SUBSCRIPTION_ID --resource-group $RESOURCE_GROUP --vmss-name $VMSS_NAME --name ChaosLinuxAgent --publisher Microsoft.Azure.Chaos --version 1.0 --settings '{"profile": "$AGENT_PROFILE_ID", "auth.msi.clientid":"$USER_IDENTITY_CLIENT_ID", "appinsightskey":"$APP_INSIGHTS_KEY","Overrides":{"CheckCertRevocation":true}}' ``` 1. If you're setting up a virtual machine scale set, verify that the instances were upgraded to the latest model. If needed, upgrade all instances in the model.
communication-services	Ai	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/communication-services/concepts/ai.md	The patterns for integrating AI into the voice and video system are summarized h \| Feature \| Accessor \| Transformer \| Bot \| Description \| \|--\|--\|--\|--\|--\| \| [Call Automation REST APIs and SDKs](../concepts/call-automation/call-automation.md) \| Γ£à \| Γ£à \| \| Call Automation APIs include both accessors and transformers, with REST APIs for playing audio files and recognizing a userΓÇÖs response. The `recognize` APIs integrate Azure Bot Services to transform usersΓÇÖ audio content into text for easier processing by your service. The most common scenario for these APIs is implementing voice bots, sometimes called interactive voice response (IVR). \| -\| [Microsoft Copilot Studio](https://learn.microsoft.com/microsoft-copilot-studio/voice-overview) \| \| Γ£à \| Γ£à \| Copilot studio is directly integrated with Azure Communication Services telephony. This integration is designed for voice bots and IVR. \| -\| [Azure Portal Copilot](https://learn.microsoft.com/azure/communication-services/concepts/voice-video-calling/call-diagnostics#copilot-in-azure-for-call-diagnostics) \| \| Γ£à \| Γ£à \| Copilot in the Azure portal allows you to ask questions about Azure Communication Services. Currently this copilot answers questions using information solely from Azure's technical documentation, and is best used for asking questions about error codes and API behavior. \| +\| [Microsoft Copilot Studio](/microsoft-copilot-studio/voice-overview) \| \| Γ£à \| Γ£à \| Copilot studio is directly integrated with Azure Communication Services telephony. This integration is designed for voice bots and IVR. \| +\| [Azure Portal Copilot](/azure/communication-services/concepts/voice-video-calling/call-diagnostics#copilot-in-azure-for-call-diagnostics) \| \| Γ£à \| Γ£à \| Copilot in the Azure portal allows you to ask questions about Azure Communication Services. Currently this copilot answers questions using information solely from Azure's technical documentation, and is best used for asking questions about error codes and API behavior. \| \| [Client Raw Audio and Video](../concepts/voice-video-calling/media-access.md) \| Γ£à \| \| \| The Calling client SDK provides APIs for accessing and modifying the raw audio and video feed. An example scenario is taking the video feed, detecting the human speaker and their background, and customizing that background. \| \| [Client Background effects](../quickstarts/voice-video-calling/get-started-video-effects.md?pivots=platform-web)\| \| Γ£à \| \| The Calling client SDKs provides APIs for blurring or replacing a userΓÇÖs background. \| \| [Client Captions](../concepts/voice-video-calling/closed-captions.md) \| \| Γ£à \| \| The Calling client SDK provides APIs for real-time closed captions. These internally integrate Azure Cognitive Services to transform audio content from the call into text in real-time. \|
confidential-computing	Harden A Linux Image To Remove Azure Guest Agent	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/harden-a-linux-image-to-remove-azure-guest-agent.md	Title: Harden a Linux image to remove Azure guest agent description: Learn how to use the Azure CLI to harden a linux image to remove Azure guest agent. -+ m Last updated 8/03/2023
confidential-computing	Harden The Linux Image To Remove Sudo Users	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/harden-the-linux-image-to-remove-sudo-users.md	Title: Harden a Linux image to remove sudo users description: Learn how to use the Azure CLI to harden the linux image to remove sudo users. -+ m Last updated 7/21/2023
confidential-computing	How To Create Custom Image Confidential Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/how-to-create-custom-image-confidential-vm.md	Title: Create a custom image for Azure confidential VMs description: Learn how to use the Azure CLI to create a Confidential VM custom image from a vhd. -+ m Last updated 6/09/2023
confidential-computing	How To Fortanix Confidential Computing Manager Node Agent	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/how-to-fortanix-confidential-computing-manager-node-agent.md	In this quickstart, you used Fortanix tooling to convert your application image To learn more about Azure's confidential computing offerings, see [Azure confidential computing overview](overview.md). -Learn how to complete similar tasks using other third-party offerings on Azure, like [Anjuna](https://azuremarketplace.microsoft.com/marketplace/apps/anjuna-5229812.aee-az-v1) and [Scone](https://sconedocs.github.io). +Learn how to complete similar tasks using other third-party offerings on Azure, like [Anjuna](https://azuremarketplace.microsoft.com/marketplace/apps/anjuna1646713490052.anjuna_cc_saas?tab=Overview) and [Scone](https://sconedocs.github.io).
confidential-computing	How To Leverage Virtual Tpms In Azure Confidential Vms	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/how-to-leverage-virtual-tpms-in-azure-confidential-vms.md	Title: Leverage virtual TPMs in Azure confidential VMs description: Learn how to use the vTPM benefits after building trust in a confidential VM. -+ m Last updated 08/02/2023
confidential-computing	Key Rotation Offline	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/key-rotation-offline.md	Title: Rotate customer-managed keys for Azure confidential virtual machines description: Learn how to rotate customer-managed keys for confidential virtual machines (confidential VMs) in Azure. -+ Last updated 07/06/2022
confidential-computing	Quick Create Confidential Vm Azure Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/quick-create-confidential-vm-azure-cli.md	Title: Create a confidential VM with the Azure CLI for Azure confidential computing description: Learn how to use the Azure CLI to create a confidential virtual machine for use with Azure confidential computing. -+ m Last updated 12/01/2023
confidential-computing	Vmss Deployment From Hardened Linux Image	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-computing/vmss-deployment-from-hardened-linux-image.md	Title: Deploy a virtual machine scale set using a hardened Linux image description: Learn how to use vmss to deploy a scale set using the hardened linux image. -+ m Last updated 9/12/2023
container-instances	Container Instances Container Group Ssl	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/container-instances/container-instances-container-group-ssl.md	Last updated 06/17/2022 This article shows how to create a [container group](container-instances-container-groups.md) with an application container and a sidecar container running a TLS/SSL provider. By setting up a container group with a separate TLS endpoint, you enable TLS connections for your application without changing your application code. You set up an example container group consisting of two containers: -* An application container that runs a simple web app using the public Microsoft [aci-helloworld](https://hub.docker.com/_/microsoft-azuredocs-aci-helloworld) image. +* An application container that runs a simple web app using the public Microsoft aci-helloworld image. * A sidecar container running the public [Nginx](https://hub.docker.com/_/nginx) image, configured to use TLS. In this example, the container group only exposes port 443 for Nginx with its public IP address. Nginx routes HTTPS requests to the companion web app, which listens internally on port 80. You can adapt the example for container applications that listen on other ports.
container-instances	Container Instances Environment Variables	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/container-instances/container-instances-environment-variables.md	Setting environment variables in your container instances allows you to provide To set environment variables in a container, specify them when you create a container instance. This article shows examples of setting environment variables when you start a container with the [Azure CLI](#azure-cli-example), [Azure PowerShell](#azure-powershell-example), and the [Azure portal](#azure-portal-example). -For example, if you run the Microsoft [aci-wordcount][aci-wordcount] container image, you can modify its behavior by specifying the following environment variables: +For example, if you run the Microsoft aci-wordcount container image, you can modify its behavior by specifying the following environment variables: NumWords: The number of words sent to STDOUT. If you need to pass secrets as environment variables, Azure Container Instances ## Azure CLI example -To see the default output of the [aci-wordcount][aci-wordcount] container, run it first with this [az container create][az-container-create] command (no environment variables specified): +To see the default output of the aci-wordcount container, run it first with this [az container create][az-container-create] command (no environment variables specified): ```azurecli-interactive az container create \ The outputs of the containers show how you've modified the second container's sc Setting environment variables in PowerShell is similar to the CLI, but uses the `-EnvironmentVariable` command-line argument. -First, launch the [aci-wordcount][aci-wordcount] container in its default configuration with this [New-AzContainerGroup][new-Azcontainergroup] command: +First, launch the aci-wordcount container in its default configuration with this [New-AzContainerGroup][new-Azcontainergroup] command: ```azurepowershell-interactive New-AzContainerGroup `
container-instances	Container Instances Restart Policy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/container-instances/container-instances-restart-policy.md	az container create \ ## Run to completion example -To see the restart policy in action, create a container instance from the Microsoft [aci-wordcount][aci-wordcount-image] image, and specify the `OnFailure` restart policy. This example container runs a Python script that, by default, analyzes the text of Shakespeare's [Hamlet](http://shakespeare.mit.edu/hamlet/full.html), writes the 10 most common words to STDOUT, and then exits. +To see the restart policy in action, create a container instance from the Microsoft aci-wordcount image, and specify the `OnFailure` restart policy. This example container runs a Python script that, by default, analyzes the text of Shakespeare's [Hamlet](http://shakespeare.mit.edu/hamlet/full.html), writes the 10 most common words to STDOUT, and then exits. Run the example container with the following [az container create][az-container-create] command:
container-instances	Container Instances Start Command	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/container-instances/container-instances-start-command.md	The command line syntax varies depending on the Azure API or tool used to create ## Azure CLI example -As an example, modify the behavior of the [microsoft/aci-wordcount][aci-wordcount] container image, which analyzes text in Shakespeare's Hamlet to find the most frequently occurring words. Instead of analyzing Hamlet, you could set a command line that points to a different text source. +As an example, modify the behavior of the microsoft/aci-wordcount container image, which analyzes text in Shakespeare's Hamlet to find the most frequently occurring words. Instead of analyzing Hamlet, you could set a command line that points to a different text source. -To see the output of the [microsoft/aci-wordcount][aci-wordcount] container when it analyzes the default text, run it with the following [az container create][az-container-create] command. No start command line is specified, so the default container command runs. For illustration purposes, this example sets [environment variables](container-instances-environment-variables.md) to find the top 3 words that are at least five letters long: +To see the output of the microsoft/aci-wordcount container when it analyzes the default text, run it with the following [az container create][az-container-create] command. No start command line is specified, so the default container command runs. For illustration purposes, this example sets [environment variables](container-instances-environment-variables.md) to find the top 3 words that are at least five letters long: ```azurecli-interactive az container create \
container-instances	Container Instances Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/container-instances/container-instances-vnet.md	Once you've deployed your first container group with this method, you can deploy ### Example -The following [az container create][az-container-create] command specifies settings for a new virtual network and subnet. Provide the name of a resource group that was created in a region where container group deployments in a virtual network are [available](container-instances-region-availability.md). This command deploys the public Microsoft [aci-helloworld][aci-helloworld] container that runs a small Node.js webserver serving a static web page. In the next section, you'll deploy a second container group to the same subnet, and test communication between the two container instances. +The following [az container create][az-container-create] command specifies settings for a new virtual network and subnet. Provide the name of a resource group that was created in a region where container group deployments in a virtual network are [available](container-instances-region-availability.md). This command deploys the public Microsoft aci-helloworld container that runs a small Node.js webserver serving a static web page. In the next section, you'll deploy a second container group to the same subnet, and test communication between the two container instances. ```azurecli-interactive az container create \
container-instances	Container Instances Volume Azure Files	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/container-instances/container-instances-volume-azure-files.md	The `--dns-name-label` value must be unique within the Azure region where you cr ## Manage files in mounted volume -Once the container starts up, you can use the simple web app deployed via the Microsoft [aci-hellofiles][aci-hellofiles] image to create small text files in the Azure file share at the mount path you specified. Obtain the web app's fully qualified domain name (FQDN) with the [az container show][az-container-show] command: +Once the container starts up, you can use the simple web app deployed via the Microsoft aci-hellofiles image to create small text files in the Azure file share at the mount path you specified. Obtain the web app's fully qualified domain name (FQDN) with the [az container show][az-container-show] command: ```azurecli-interactive az container show --resource-group $ACI_PERS_RESOURCE_GROUP \
container-registry	Container Registry Tasks Base Images	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/container-registry/container-registry-tasks-base-images.md	This article provides background information about updates to an application's b ## What are base images? -Dockerfiles defining most container images specify a parent image from which the image is based, often referred to as its base image. Base images typically contain the operating system, for example [Alpine Linux][base-alpine] or [Windows Nano Server][base-windows], on which the rest of the container's layers are applied. They might also include application frameworks such as [Node.js][base-node] or [.NET Core][base-dotnet]. These base images are themselves typically based on public upstream images. Several of your application images might share a common base image. +Dockerfiles defining most container images specify a parent image from which the image is based, often referred to as its base image. Base images typically contain the operating system, for example [Alpine Linux][base-alpine] or Windows Nano Server, on which the rest of the container's layers are applied. They might also include application frameworks such as [Node.js][base-node] or [.NET Core][base-dotnet]. These base images are themselves typically based on public upstream images. Several of your application images might share a common base image. A base image is often updated by the image maintainer to include new features or improvements to the OS or framework in the image. Security patches are another common cause for a base image update. When these upstream updates occur, you must also update your base images to include the critical fix. Each application image must then also be rebuilt to include these upstream fixes now included in your base image.
copilot	Capabilities	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/copilot/capabilities.md	Use Microsoft Copilot in Azure to perform many basic tasks in the Azure portal o - Write and optimize code: - [Generate Azure CLI scripts](generate-cli-scripts.md) - [Generate PowerShell scripts](generate-powershell-scripts.md) + - [Generate Terraform configurations](generate-terraform-configurations.md) - [Discover performance recommendations with Code Optimizations](optimize-code-application-insights.md) - [Author API Management policies](author-api-management-policies.md) - [Create Kubernetes YAML files](generate-kubernetes-yaml.md)
copilot	Generate Terraform Configurations	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/copilot/generate-terraform-configurations.md	+ + Title: Generate Terraform configurations using Microsoft Copilot in Azure +description: Learn about scenarios where Microsoft Copilot in Azure can generate Terraform configurations for you to use. Last updated : 06/17/2024++++++ +# Generate Terraform configurations using Microsoft Copilot in Azure + +Microsoft Copilot in Azure (preview) can generate Terraform configurations that you can use to create and manage your Azure infrastructure. + +When you tell Microsoft Copilot in Azure about some Azure infrastructure that you want to manage through Terraform, it provides a configuration using resources from the [AzureRM provider](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs). In addition to the primary resources, any dependent resources required to accomplish a successful deployment are included in the configuration. You can ask follow-up questions to further customize the configuration. Once you've reviewed the configuration and are happy with it, copy the configuration contents and deploy the configuration using your Terraform deployment method of choice. + +The requested Azure infrastructure should be limited to fewer than eight primary resource types. For example, you should see good results when asking for a configuration to manage a resource group that contains Azure Container App, Azure Functions, and Azure Cosmos DB resources. However, requesting configurations to fully address complex architectures may result in inaccurate results and truncated configurations. +++ +## Sample prompts + +Here are a few examples of the kinds of prompts you can use to generate Terraform configurations. Modify these prompts based on your real-life scenarios, or try additional prompts to create different kinds of queries. + +- "Create a Terraform config for a Cognitive Services instance with name 'mycognitiveservice' and S0 pricing tier. I need it to have the Text Analytics and Face APIs enabled, and I also need the access keys for authentication purposes." +- "Show me a Terraform configuration for a virtual machine with a size of 'Standard_D2s_v3' and an image of 'UbuntuServer 18.04-LTS'. The resource should be placed in the West US location and have a public IP address. Additionally, it should have a managed disk with a size of 50 GB and be part of a virtual network." +- "Create Terraform configuration for a container app resource with name 'myapp' and Linux OS. Specify the Docker image to be pulled from 'myrepository' and use port 80 for networking. Set the CPU and memory limits to 2 and 4GB respectively. Enable automatic scaling based on CPU usage and set the scaling range between 2 to 10 instances. Also, configure environment variables and mount a storage account for persistent data." +- "What is the Terraform code for a Databricks workspace in Azure with name 'myworkspace' and a premium SKU. The workspace should be created in the West US region. I also need to enable workspace-wide access control with Microsoft Entra integration." +- "Use Terraform to create a new Azure SQL database named 'mydatabase' with Basic SKU and 10 DTU. Set the collation to 'SQL_Latin1_General_CP1_CI_AS' and enable Microsoft Entra authentication. Also, enable long-term backup retention and configure geo-replication to a secondary region for high availability." + +## Next steps + +- Explore [capabilities](capabilities.md) of Microsoft Copilot in Azure. +- Learn more about [Terraform on Azure](/azure/developer/terraform/overview).
copilot	Get Monitoring Information	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/copilot/get-monitoring-information.md	- build-2024 + # Get information about Azure Monitor metrics, logs, and alerts using Microsoft Copilot in Azure (preview)
cosmos-db	Emulator Release Notes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/emulator-release-notes.md	Last updated 06/20/2024 The Azure Cosmos DB emulator is updated at a regular cadence with release notes provided in this article. > [!div class="nextstepaction"] -> [Download latest version (``2.14.18``)](https://aka.ms/cosmosdb-emulator) +> [Download latest version (``2.14.19``)](https://aka.ms/cosmosdb-emulator) ## Supported versions Only the most recent version of the Azure Cosmos DB emulator is actively supported. -## Latest version ``2.14.18`` +## Latest version ``2.14.19`` -> Released May 21, 2024 +> Released Aug 6, 2024 -- This release updates the Azure Cosmos DB Emulator background services to match the latest online functionality of the Azure Cosmos DB. +- Update Azure Cosmos DB Emulator background services to match the latest online functionality of the Azure Cosmos DB. +- Update data explorer to support vector search ## Previous releases > [!WARNING] > Previous versions of the emulator are not supported by the product group. +### ``2.14.18`` (May 21, 2024) + +- This release updates the Azure Cosmos DB Emulator background services to match the latest online functionality of the Azure Cosmos DB. + ### ``2.14.17`` (May 16, 2024) - This release updates the Azure Cosmos DB Emulator background services to match the latest online functionality of the Azure Cosmos DB.
cosmos-db	How To Setup Rbac	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/how-to-setup-rbac.md	Create a role named MyReadOnlyRole that only contains read actions in a file n ```azurecli resourceGroupName='<myResourceGroup>' accountName='<myCosmosAccount>' -az cosmosdb sql role definition create --account-name $accountName --resource-group $resourceGroupName --body @role-definition-ro.json +az cosmosdb sql role definition create --account-name $accountName --resource-group $resourceGroupName --body @role-definition-ro.json ``` Create a role named MyReadWriteRole that contains all actions in a file named role-definition-rw.json: Create a role named MyReadWriteRole that contains all actions in a file named ``` ```azurecli -az cosmosdb sql role definition create --account-name $accountName --resource-group $resourceGroupName --body @role-definition-rw.json +az cosmosdb sql role definition create --account-name $accountName --resource-group $resourceGroupName --body @role-definition-rw.json ``` List the role definitions you've created to fetch their IDs: accountName='<myCosmosAccount>' readOnlyRoleDefinitionId='<roleDefinitionId>' # as fetched above # For Service Principals make sure to use the Object ID as found in the Enterprise applications section of the Azure Active Directory portal blade. principalId='<aadPrincipalId>' -az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $readOnlyRoleDefinitionId +az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $readOnlyRoleDefinitionId --principal-type "ServicePrincipal" ``` ### Using Bicep/Azure Resource Manager templates
cosmos-db	How To Python Get Started	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/mongodb/how-to-python-get-started.md	This article shows you how to connect to Azure Cosmos DB for MongoDB using the P > [!NOTE] > The [example code snippets](https://github.com/Azure-Samples/azure-cosmos-db-mongodb-python-getting-started) are available on GitHub as a Python project. -This article shows you how to communicate with the Azure Cosmos DBΓÇÖs API for MongoDB by using one of the open-source MongoDB client drivers for Python, [PyMongo](https://www.mongodb.com/docs/drivers/pymongo/). +This article shows you how to communicate with the Azure Cosmos DB's API for MongoDB by using one of the open-source MongoDB client drivers for Python, [PyMongo](https://www.mongodb.com/docs/drivers/pymongo/). ## Prerequisites * An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free). * [Python 3.8+](https://www.python.org/downloads/) * [Azure Command-Line Interface (CLI)](/cli/azure/) or [Azure PowerShell](/powershell/azure/) -* [Azure Cosmos DB for MongoDB resource](quickstart-python.md#create-an-azure-cosmos-db-account) +* [Azure Cosmos DB for MongoDB resource](/azure/cosmos-db/nosql/quickstart-portal) ## Create a new Python app
cosmos-db	Quickstart Python	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/mongodb/quickstart-python.md	ms.devlang: python Previously updated : 11/08/2022 Last updated : 08/01/2024 +zone_pivot_groups: azure-cosmos-db-quickstart-env # Quickstart: Azure Cosmos DB for MongoDB for Python with MongoDB driver [!INCLUDE[MongoDB](~/reusable-content/ce-skilling/azure/includes/cosmos-db/includes/appliesto-mongodb.md)] -> [!div class="op_single_selector"] -> -> * [.NET](quickstart-dotnet.md) -> * [Python](quickstart-python.md) -> * [Java](quickstart-java.md) -> * [Node.js](quickstart-nodejs.md) -> * [Go](quickstart-go.md) -> -Get started with the PyMongo package to create databases, collections, and documents within your Azure Cosmos DB resource. Follow these steps to install the package and try out example code for basic tasks. +Get started with MongoDB to create databases, collections, and docs within your Azure Cosmos DB resource. Follow these steps to deploy a minimal solution to your environment using the Azure Developer CLI. -> [!NOTE] -> The [example code snippets](https://github.com/Azure-Samples/azure-cosmos-db-mongodb-python-getting-started) are available on GitHub as a Python project. - -In this quickstart, you'll communicate with the Azure Cosmos DBΓÇÖs API for MongoDB by using one of the open-source MongoDB client drivers for Python, [PyMongo](https://www.mongodb.com/docs/drivers/pymongo/). Also, you'll use the [MongoDB extension commands](./custom-commands.md), which are designed to help you create and obtain database resources that are specific to the [Azure Cosmos DB capacity model](../resource-model.md). +[API for MongoDB reference documentation](https://www.mongodb.com/docs/drivers/python-drivers/) \| [pymongo package](https://pypi.org/project/pymongo/) + \| [Azure Developer CLI](/azure/developer/azure-developer-cli/overview) ## Prerequisites -* An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free). -* [Python 3.8+](https://www.python.org/downloads/) -* [Azure Command-Line Interface (CLI)](/cli/azure/) or [Azure PowerShell](/powershell/azure/) -### Prerequisite check +## Setting up -* In a terminal or command window, run `python --version` to check that you have a recent version of Python. -* Run ``az --version`` (Azure CLI) or `Get-Module -ListAvailable Az` (Azure PowerShell) to check that you have the appropriate Azure command-line tools installed. +Deploy this project's development container to your environment. Then, use the Azure Developer CLI (`azd`) to create an Azure Cosmos DB for MongoDB account and deploy a containerized sample application. The sample application uses the client library to manage, create, read, and query sample data. -## Setting up -This section walks you through creating an Azure Cosmos DB account and setting up a project that uses the MongoDB npm package. +[![Open in GitHub Codespaces](https://img.shields.io/static/v1?style=for-the-badge&label=GitHub+Codespaces&message=Open&color=brightgreen&logo=github)](https://codespaces.new/alexwolfmsft/cosmos-db-mongodb-python-quickstart?template=false&quickstart=1&azure-portal=true) -### Create an Azure Cosmos DB account -This quickstart will create a single Azure Cosmos DB account using the API for MongoDB. -#### [Azure CLI](#tab/azure-cli) +[![Open in Dev Container](https://img.shields.io/static/v1?style=for-the-badge&label=Dev+Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/alexwolfmsft/cosmos-db-mongodb-python-quickstart) -#### [PowerShell](#tab/azure-powershell) +> [!IMPORTANT] +> GitHub accounts include an entitlement of storage and core hours at no cost. For more information, see [included storage and core hours for GitHub accounts](https://docs.github.com/billing/managing-billing-for-github-codespaces/about-billing-for-github-codespaces#monthly-included-storage-and-core-hours-for-personal-accounts). -#### [Portal](#tab/azure-portal) - -### Get MongoDB connection string +1. Open a terminal in the root directory of the project. -#### [Azure CLI](#tab/azure-cli) +1. Authenticate to the Azure Developer CLI using `azd auth login`. Follow the steps specified by the tool to authenticate to the CLI using your preferred Azure credentials. + ```azurecli + azd auth login + ``` -#### [PowerShell](#tab/azure-powershell) +1. Use `azd init` to initialize the project. + ```azurecli + azd init --template cosmos-db-mongodb-python-quickstart + ``` -#### [Portal](#tab/azure-portal) + > [!NOTE] + > This quickstart uses the [azure-samples/cosmos-db-mongodb-python-quickstart](https://github.com/alexwolfmsft/cosmos-db-mongodb-python-quickstart) template GitHub repository. The Azure Developer CLI automatically clones this project to your machine if it is not already there. +1. During initialization, configure a unique environment name. - + > [!TIP] + > The environment name will also be used as the target resource group name. For this quickstart, consider using `msdocs-cosmos-db`. -### Create a new Python app +1. Deploy the Azure Cosmos DB account using `azd up`. The Bicep templates also deploy a sample web application. -1. Create a new empty folder using your preferred terminal and change directory to the folder. + ```azurecli + azd up + ``` - > [!NOTE] - > If you just want the finished code, download or fork and clone the [example code snippets](https://github.com/Azure-Samples/azure-cosmos-db-mongodb-python-getting-started) repo that has the full example. You can also `git clone` the repo in Azure Cloud Shell to walk through the steps shown in this quickstart. +1. During the provisioning process, select your subscription and desired location. Wait for the provisioning process to complete. The process can take approximately five minutes. + +1. Once the provisioning of your Azure resources is done, a URL to the running web application is included in the output. + + ```output + Deploying services (azd deploy) + + (Γ£ô) Done: Deploying service web + - Endpoint: <https://[container-app-sub-domain].azurecontainerapps.io> + + SUCCESS: Your application was provisioned and deployed to Azure in 5 minutes 0 seconds. + ``` + +1. Use the URL in the console to navigate to your web application in the browser. Observe the output of the running app. -2. Create a requirements.txt* file that lists the [PyMongo](https://www.mongodb.com/docs/drivers/pymongo/) and [python-dotenv](https://pypi.org/project/python-dotenv/) packages. + :::image type="content" source="media/quickstart-python/cosmos-python-app.png" alt-text="Screenshot of the running web application."::: - ```text ++ +### Install the client library + +1. Create a `requirements.txt` file in your app directory that lists the [PyMongo](https://www.mongodb.com/docs/drivers/pymongo/) and [python-dotenv](https://pypi.org/project/python-dotenv/) packages. + + ```bash # requirements.txt pymongo python-dotenv ``` -3. Create a virtual environment and install the packages. +1. Create a virtual environment and install the packages. + + ## [Windows](#tab/windows-package) - #### [Windows](#tab/venv-windows) - ```bash # py -3 uses the global python interpreter. You can also use python3 -m venv .venv. py -3 -m venv .venv source .venv/Scripts/activate pip install -r requirements.txt ``` - - #### [Linux / macOS](#tab/venv-linux+macos) - + + ## [Linux/macOS](#tab/linux-package) + ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` - - - -### Configure environment variables + ## Object model For the steps below, the database won't use sharding and shows a synchronous app :::code language="python" source="~/azure-cosmos-db-mongodb-python-getting-started/001-quickstart/run.py" id="constant_values"::: -### Connect to Azure Cosmos DBΓÇÖs API for MongoDB +### Connect to Azure Cosmos DB's API for MongoDB Use the [MongoClient](https://pymongo.readthedocs.io/en/stable/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient) object to connect to your Azure Cosmos DB for MongoDB resource. The connect method returns a reference to the database. Remove-AzResourceGroup @parameters 1. On the Are you sure you want to delete dialog, enter the name of the resource group, and then select Delete. -+
cosmos-db	Release Notes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/mongodb/vcore/release-notes.md	Previously updated : 07/02/2024 Last updated : 08/08/2024 #Customer intent: As a database administrator, I want to review the release notes, so I can understand what new features are released for the service. Last updated 07/02/2024 This article contains release notes for the API for MongoDB vCore. These release notes are composed of feature release dates, and feature updates. -## Latest release: July 02, 2024 +## Latest release: Aug 05, 2024 -- Metrics added - - Customer Activity. - - Requests. +- [Geospatial support](geospatial-support.md) is now GA. -(Preview feature list) +- Support for TLS1.3 for mongo connections. - Support for accumulators - - $mergeObjects. -- Support for aggregation operator - - $let. + - $mergeObjects. +- Support for aggregation operators + - $bitAnd. + - $bitOr. + - $bitXor. + - $bitNot. + - $let. +- Support for aggregation stage + - $bucket. + - $vectorSearch. + - $setWindowFields (Only with $sum window accumulator). - Geospatial query operators - - $minDistance. - - $maxDistance. + - $minDistance. + - $maxDistance. + - $near. + - $nearSphere. ## Previous releases +## July 02, 2024 + +- Metrics added + - Customer Activity. + - Requests. + +- Support for accumulators [Preview] + - $mergeObjects. + +- Support for aggregation operator [Preview] + - $let. + +- Geospatial query operators [Preview] + - $minDistance. + - $maxDistance. + ### May 06, 2024 - Query operator enhancements. - - $geoNear aggregation. This can be enabled through Flag - `Geospatial support for vcore "MongoDB for CosmosDB"` + - $geoNear aggregation. This can be enabled through Flag - `Geospatial support for vCore "MongoDB for CosmosDB"` (Preview feature) -(Preview feature list) - Support for accumulators - $push. - $addToSet. This article contains release notes for the API for MongoDB vCore. These release - $dateAdd. - $dateSubtract. - $dateDiff. + - Support for aggregation operators + - $maxN/minN. + - $sortArray. + - $zip. + - Creating indexes with large index keys: values larger than 2.7 KB. - Geo replicas enabling cross-region disaster recovery and reads scaling. - Improved performance of group and distinct. This article contains release notes for the API for MongoDB vCore. These release ### March 18, 2024 -- [Private Endpoint](how-to-private-link.md) support enabled on Portal. (GA)-- [HNSW](vector-search.md) vector index on M40 & larger cluster tiers. (GA)-- Enable Geo-spatial queries. (Public Preview) +- [Private Endpoint](how-to-private-link.md) support enabled on Portal. +- [HNSW](vector-search.md) vector index on M40 & larger cluster tiers. +- Enable Geo-spatial queries. (Preview) - Query operator enhancements. - $centerSphere with index pushdown. - $min & $max operator with $project. - $binarySize aggregation operator.-- Ability to build indexes in background (except Unique indexes). (Public Preview) +- Ability to build indexes in background (except Unique indexes). (Preview) ### March 03, 2024
cosmos-db	Monitor Resource Logs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/monitor-resource-logs.md	Here, we walk through the process of creating diagnostic settings for your accou > [!IMPORTANT] > You might see a prompt to "enable full-text query \[...\] for more detailed logging" if the full-text query feature is not enabled in your account. You can safely ignore this warning if you do not wish to enable this feature. For more information, see [enable full-text query](monitor-resource-logs.md#enable-full-text-query-for-logging-query-text). -1. In the Diagnostic settings pane, name the setting example-setting and then select the QueryRuntimeStatistics category. Send the logs to a Log Analytics Workspace selecting your existing workspace. Finally, select Resource specific as the destination option. +1. In the Diagnostic settings pane, name the setting example-setting and then select the QueryRuntimeStatistics category. Enable Send to Log Analytics Workspace checkbox, selecting your existing workspace. Finally, select Resource specific as the destination option. :::image type="content" source="media/monitor-resource-logs/configure-diagnostic-setting.png" alt-text="Screenshot of the various options to configure a diagnostic setting.":::
cosmos-db	Change Feed Modes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/nosql/change-feed-modes.md	During the preview, the following methods to read the change feed are available \| Method to read change feed \| .NET \| Java \| Python \| Node.js \| \| \| \| \| \| \| -\| [Change feed pull model](change-feed-pull-model.md) \| [>= 3.32.0-preview](https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.32.0-preview) \| [>= 4.42.0](https://mvnrepository.com/artifact/com.azure/azure-cosmos/4.37.0) \| No \| [>= 4.1.0](https://www.npmjs.com/package/@azure/cosmos/v/4.1.0) \| +\| [Change feed pull model](change-feed-pull-model.md) \| [>= 3.32.0-preview](https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.32.0-preview) \| [>= 4.42.0](https://mvnrepository.com/artifact/com.azure/azure-cosmos/4.37.0) \| No \| [>= 4.1.0](https://www.npmjs.com/package/@azure/cosmos?activeTab=versions) \| \| [Change feed processor](change-feed-processor.md) \| No \| [>= 4.42.0](https://mvnrepository.com/artifact/com.azure/azure-cosmos/4.42.0) \| No \| No \| \| Azure Functions trigger \| No \| No \| No \| No \|
cosmos-db	Performance Tips Java Sdk V4	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/nosql/performance-tips-java-sdk-v4.md	These techniques provide advanced mechanisms to address specific latency and ava ### Threshold-based availability strategy -The threshold-based availability strategy can improve tail latency and availability by sending parallel read requests to secondary regions and accepting the fastest response. This approach can drastically reduce the impact of regional outages or high-latency conditions on application performance. +The threshold-based availability strategy can improve tail latency and availability by sending parallel read requests to secondary regions and accepting the fastest response. This approach can drastically reduce the impact of regional outages or high-latency conditions on application performance. Additionally, proactive connection management can be employed to further enhance performance by warming up connections and caches across both the current read region and preferred remote regions. Example configuration: ```java +// Proactive Connection Management +CosmosContainerIdentity containerIdentity = new CosmosContainerIdentity("sample_db_id", "sample_container_id"); +int proactiveConnectionRegionsCount = 2; +Duration aggressiveWarmupDuration = Duration.ofSeconds(1); + +CosmosAsyncClient clientWithOpenConnections = new CosmosClientBuilder() + .endpoint("<account URL goes here") + .key("<account key goes here>") + .endpointDiscoveryEnabled(true) + .preferredRegions(Arrays.asList("sample_region_1", "sample_region_2")) + .openConnectionsAndInitCaches(new CosmosContainerProactiveInitConfigBuilder(Arrays.asList(containerIdentity)) + .setProactiveConnectionRegionsCount(proactiveConnectionRegionsCount) + //setting aggressive warmup duration helps in cases where there is a high no. of partitions + .setAggressiveWarmupDuration(aggressiveWarmupDuration) + .build()) + .directMode() + .buildAsyncClient(); + +CosmosAsyncContainer container = clientWithOpenConnections.getDatabase("sample_db_id").getContainer("sample_container_id"); + int threshold = 500; int thresholdStep = 100; options.setCosmosEndToEndOperationLatencyPolicyConfig(config); container.readItem("id", new PartitionKey("pk"), options, JsonNode.class).block(); -//Write operations can benefit from threshold-based availability strategy if opted into non-idempotent write retry policy -//and the account is configured for multi-region writes. +// Write operations can benefit from threshold-based availability strategy if opted into non-idempotent write retry policy +// and the account is configured for multi-region writes. options.setNonIdempotentWriteRetryPolicy(true, true); container.createItem("id", new PartitionKey("pk"), options, JsonNode.class).block(); ``` container.createItem("id", new PartitionKey("pk"), options, JsonNode.class).bloc 4. Fastest Response Wins: Whichever region responds first, that response is accepted, and the other parallel requests are ignored. +Proactive connection management helps by warming up connections and caches for containers across the preferred regions, reducing cold-start latency for failover scenarios or writes in multi-region setups. + This strategy can significantly improve latency in scenarios where a particular region is slow or temporarily unavailable, but it may incur more cost in terms of request units when parallel cross-region requests are required. > [!NOTE]
cosmos-db	Computed Properties	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/nosql/query/computed-properties.md	ms.devlang: nosql Previously updated : 02/27/2024 Last updated : 08/08/2024 The constraints on computed property names are: ### Query constraints -Queries in the computed property definition must be valid syntactically and semantically, otherwise the create or update operation fails. Queries should evaluate to a deterministic value for all items in a container. Queries may evaluate to undefined or null for some items, and computed properties with undefined or null values behave the same as persisted properties with undefined or null values when used in queries. +Queries in the computed property definition must be valid syntactically and semantically, otherwise the create or update operation fails. Queries should evaluate to a deterministic value for all items in a container. Queries might evaluate to undefined or null for some items, and computed properties with undefined or null values behave the same as persisted properties with undefined or null values when used in queries. The limitations on computed property query definitions are: - Queries must specify a FROM clause that represents the root item reference. Examples of supported FROM clauses are: `FROM c`, `FROM root c`, and `FROM MyContainer c`. - Queries must use a VALUE clause in the projection. - Queries can't include a JOIN.-- Queries can't use non-deterministic Scalar expressions. Examples of non-deterministic scalar expressions are: GetCurrentDateTime, GetCurrentTimeStamp, GetCurrentTicks, and RAND. +- Queries can't use nondeterministic Scalar expressions. Examples of nondeterministic scalar expressions are: GetCurrentDateTime, GetCurrentTimeStamp, GetCurrentTicks, and RAND. - Queries can't use any of the following clauses: WHERE, GROUP BY, ORDER BY, TOP, DISTINCT, OFFSET LIMIT, EXISTS, ALL, LAST, FIRST, and NONE. - Queries can't include a scalar subquery. - Aggregate functions, spatial functions, nondeterministic functions, and user defined functions (UDFs) aren't supported. ## Create computed properties -After the computed properties are created, you can execute queries that reference the properties by using any method, including all SDKs and Azure Data Explorer in the Azure portal. +After the computed properties are created, you can execute queries that reference the properties by using any method, including all software development kits (SDKs) and Azure Data Explorer in the Azure portal. \| \| Supported version \| Notes \| \| \| \| \| const { container: containerWithComputedProperty } = await database.containers.c ### [Python](#tab/python) -You can define multiple computed properties in a list and then add them to the container properties. Python SDK currently doesn't support computed properties on existing containers. +You can define multiple computed properties in a list and then add them to the container properties. Python SDK currently doesn't support computed properties on existing containers. ```python computed_properties = [{'name': "cp_lower", 'query': "SELECT VALUE LOWER(c.db_group) FROM c"}, computed_properties = [{'name': "cp_lower", 'query': "SELECT VALUE LOWER(c.db_gr container_with_computed_props = db.create_container_if_not_exists( "myContainer", PartitionKey(path="/pk"), computed_properties=computed_properties) ```+ Computed properties can be used like any other property in queries. For example, you can use the computed property `cp_lower` in a query like this: ```python queried_items = list( container_with_computed_props.query_items(query='Select * from c Where c.cp_power = 25', partition_key="test")) ``` - Here's an example of how to update computed properties on an existing container: console.log("Container definition is undefined."); ``` ### [Python](#tab/python) -Updating computed properties on an existing container is not supported in Python SDK. You can only define computed properties when creating a new container. This is a work in progress currently. + +Updating computed properties on an existing container isn't supported in Python SDK. You can only define computed properties when creating a new container. > [!TIP] > Every time you update container properties, the old values are overwritten. If you have existing computed properties and want to add new ones, be sure that you add both new and existing computed properties to the collection. +### Create computed properties by using the Data Explorer + +You can use the Data Explorer to create a computed property for a container. + +1. Open your existing container in the Data Explorer. + +1. Navigate to the Settings section for your container. Then, navigate to the *Computed Properties subsection. + +1. Edit the computed properties definition JSON for your container. In this example, this JSON is used to define a computed property to split the `SKU` string for a retail product using the `-` delimiter. + + ```json + [ + { + "name": "cp_splitSku", + "query": "SELECT VALUE StringSplit(p.sku, \"-\") FROM products p" + } + ] + ``` + + :::image type="content" source="media/computed-properties/data-explorer-editor.png" alt-text="Screenshot of the computed properties JSON editor in the Data Explorer interface."::: + +1. Save the computed property. + ## Use computed properties in queries Computed properties can be referenced in queries the same way that persisted properties are referenced. Values for computed properties that aren't indexed are evaluated during runtime by using the computed property definition. If a computed property is indexed, the index is used the same way that it's used for persisted properties, and the computed property is evaluated on an as-needed basis. We recommend that you [add indexes on your computed properties](#index-computed-properties) for the best cost and performance. The following examples use the quickstart products dataset that's available in [Data Explorer](../../data-explorer.md) in the Azure portal. To get started, select Launch the quick start and load the dataset in a new container. Here's an example of an item: ORDER BY ## Index computed properties -Computed properties aren't indexed by default and aren't covered by wildcard paths in the [indexing policy](../../index-policy.md). You can add single or composite indexes on computed properties in the indexing policy the same way you would add indexes on persisted properties. We recommend that you add relevant indexes to all computed properties. We recommend these indexes because they're beneficial in increasing performance and reducing RUs when they're indexed. When computed properties are indexed, actual values are evaluated during item write operations to generate and persist index terms. +Computed properties aren't indexed by default and aren't covered by wildcard paths in the [indexing policy](../../index-policy.md). You can add single or composite indexes on computed properties in the indexing policy the same way you would add indexes on persisted properties. We recommend that you add relevant indexes to all computed properties. We recommend these indexes because they're beneficial in increasing performance and reducing request units (RUs). When computed properties are indexed, actual values are evaluated during item write operations to generate and persist index terms. There are a few considerations for indexing computed properties, including: -- Computed properties can be specified in included paths, excluded paths, and composite index paths.-- Computed properties can't have a spatial index defined on them.-- Wildcard paths under the computed property path work like they do for regular properties.-- If you're removing a computed property that has been indexed, all indexes on that property must also be dropped. +- Computed properties can be specified in included paths, excluded paths, and composite index paths +- Computed properties can't have a spatial index defined on them +- Wildcard paths under the computed property path work like they do for regular properties +- Related indexes on a removed and indexed property must also be dropped > [!NOTE] > All computed properties are defined at the top level of the item. The path is always `/<computed property name>`. There are a few considerations for indexing computed properties, including: > > If you want to delete a computed property, you'll first need to remove it from the index policy. - ### Add a single index for computed properties To add a single index for a computed property named `cp_myComputedProperty`: To add a composite index on two properties in which, one is computed as `cp_myCo ## Understand request unit consumption -Adding computed properties to a container doesn't consume RUs. Write operations on containers that have computed properties defined might have a slight RU increase. If a computed property is indexed, RUs on write operations increase to reflect the costs for indexing and evaluation of the computed property. +Adding computed properties to a container doesn't consume RUs. Write operations on containers that have computed properties defined might have a slight RU increase. If a computed property is indexed, RUs on write operations increase to reflect the costs for indexing and evaluation of the computed property. ## Related content
cosmos-db	Time To Live	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/nosql/time-to-live.md	Previously updated : 11/03/2023 Last updated : 08/08/2024 + # Time to Live (TTL) in Azure Cosmos DB+ [!INCLUDE[NoSQL](../includes/appliesto-nosql.md)] -With Time to Live or TTL, Azure Cosmos DB deletes items automatically from a container after a certain time period. By default, you can set time to live at the container level and override the value on a per-item basis. After you set the TTL at a container or at an item level, Azure Cosmos DB will automatically remove these items after the time period, since the time they were last modified. Time to live value is configured in seconds. When you configure TTL, the system automatically deletes the expired items based on the TTL value, without needing a delete operation explicitly issued by the client application. The maximum value for TTL is 2147483647 seconds, the approximate equivalent of 24,855 days or 68 years. +With Time to Live or TTL, Azure Cosmos DB deletes items automatically from a container after a certain time period. By default, you can set time to live at the container level and override the value on a per-item basis. After you set the TTL at a container or at an item level, Azure Cosmos DB will automatically remove these items after the time period, since the time they were last modified. Time to live value is configured in seconds. When you configure TTL, the system automatically deletes the expired items based on the TTL value, without needing a delete operation explicitly issued by the client application. The maximum value for TTL is 2,147,483,647 seconds, the approximate equivalent of 24,855 days or 68 years. + +Expired items are deleted as a background task. An item will no longer appear in query responses immediately after the TTL expires, even if it hasn't yet been permanently deleted from the container. If the container does not have enough request units (RUs) to perform the deletion, the data deletion will be delayed. The data will be deleted once sufficient RUs are available to complete the deletion. -Deletion of expired items is a background task that consumes left-over [Request Units](../request-units.md) that haven't been consumed by user requests. Even after the TTL expires, if the container is overloaded with requests and if there aren't enough RUs available, the data deletion is delayed. Data is deleted once there are enough RUs available to perform the delete operation. Though the data deletion is delayed, data isn't returned by any queries (by any API) after the TTL expires. +For provisioned throughput accounts, the deletion of expired items uses leftover RUs that haven't been consumed by user requests. + +For serverless accounts, the deletion of expired items is charged in RUs at the same rate as delete item operations. > [!NOTE] > This content is related to Azure Cosmos DB transactional store TTL. If you are looking for analytical store TTL, that enables NoETL HTAP scenarios through [Azure Synapse Link](../synapse-link.md), please click [here](../analytical-store-introduction.md#analytical-ttl). The time to live value is set in seconds, and is interpreted as a delta from the - If missing (or set to null), items aren't expired automatically. - - If present and the value is set to "-1", it's equal to infinity, and items donΓÇÖt expire by default. + - If present and the value is set to "-1," it's equal to infinity, and items donΓÇÖt expire by default. - - If present and the value is set to some nonzero number "n" ΓÇô items will expire "n" seconds after their last modified time. + - If present and the value is set to some nonzero number "n," items will expire "n" seconds after their last modified time. 2. Time to Live on an item (set using `ttl`): TTL on container is set to null (DefaultTimeToLive = null) \|TTL on item\| Result\| \|\|\| -\|ttl property missing \|TTL is disabled. The item will never expire (default).\| -\|ttl = -1\|TTL is disabled. The item will never expire.\| -\|ttl = 2000\|TTL is disabled. The item will never expire.\| +\|ttl property missing \|TTL is disabled. The item never expires (default).\| +\|ttl = -1\|TTL is disabled. The item never expires.\| +\|ttl = 2000\|TTL is disabled. The item never expires.\| ### Example 2 TTL on container is set to -1 (DefaultTimeToLive = -1) \|TTL on item\| Result\| \|\|\| -\|ttl property missing \|TTL is enabled. The item will never expire (default).\| -\|ttl = -1\|TTL is enabled. The item will never expire.\| -\|ttl = 2000\|TTL is enabled. The item will expire after 2000 seconds.\| +\|ttl property missing \|TTL is enabled. The item never expires (default).\| +\|ttl = -1\|TTL is enabled. The item never expires.\| +\|ttl = 2000\|TTL is enabled. The item expires after 2,000 seconds.\| ### Example 3 TTL on container is set to 1000 (DefaultTimeToLive = 1000) \|TTL on item\| Result\| \|\|\| -\|ttl property missing \|TTL is enabled. The item will expire after 1000 seconds (default).\| +\|ttl property missing \|TTL is enabled. The item will expire after 1,000 seconds (default).\| \|ttl = -1\|TTL is enabled. The item will never expire.\| -\|ttl = 2000\|TTL is enabled. The item will expire after 2000 seconds.\| +\|ttl = 2000\|TTL is enabled. The item will expire after 2,000 seconds.\| ## Next steps
cosmos-db	Tutorial Spark Connector	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cosmos-db/nosql/tutorial-spark-connector.md	When you work with API for NoSQL data in Spark, you can perform partial updates - [Apache Spark](https://spark.apache.org/) - [Azure Cosmos DB Catalog API](https://github.com/Azure/azure-sdk-for-jav) - [Configuration parameter reference](https://github.com/Azure/azure-sdk-for-jav)-- [Sample "New York City Taxi data" notebook](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples/Python/NYC-Taxi-Data) +- [Azure Cosmos DB Spark Connector Samples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples) - [Migrate from Spark 2.4 to Spark 3.*](https://github.com/Azure/azure-sdk-for-jav) - Version compatibility: - [Version compatibility for Spark 3.1](https://github.com/Azure/azure-sdk-for-jav#version-compatibility)
cost-management-billing	Subscription Disabled	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/manage/subscription-disabled.md	Previously updated : 03/21/2024 Last updated : 08/08/2024 # Reactivate a disabled Azure subscription -Your Azure subscription can get disabled because your credit has expired or if you reached your spending limit. It can also get disabled if you have an overdue bill, hit your credit card limit, or because the Account Administrator canceled the subscription. See what issue applies to you and follow the steps in this article to get your subscription reactivated. +Your Azure subscription can get disabled because your credit expired or you reached your spending limit. It can also get disabled if you have an overdue bill, hit your credit card limit, or because the Account Administrator canceled the subscription. Some subscriptions can also get automatically disabled. See what issue applies to you and follow the steps in this article to get your subscription reactivated. ## Your credit is expired To resolve the issue, [switch to a different credit card](change-credit-card.md) If you're the Account Administrator or subscription Owner and you canceled a pay-as-you-go subscription, you can reactivate it in the Azure portal. -If you're a billing administrator (partner billing administrator or Enterprise Administrator), you may not have the required permission to reactive the subscription. If this situation applies to you, contact the Account Administrator, or subscription Owner and ask them to reactivate the subscription. +If you're a billing administrator (partner billing administrator or Enterprise Administrator), you might not have the required permission to reactive the subscription. If this situation applies to you, contact the Account Administrator, or subscription Owner and ask them to reactivate the subscription. 1. Sign in to the [Azure portal](https://portal.azure.com). 1. Go to Subscriptions and then select the canceled subscription. After your subscription is reactivated, there might be a delay in creating or ma If you use resources that arenΓÇÖt free and your subscription gets disabled because you run out of credit, and then you upgrade your subscription, the resources get enabled after upgrade. This situation results in you getting charged for the resources used. For more information about upgrading a free account, see [Upgrade your Azure account](upgrade-azure-subscription.md). +## Subscription blocked or automatically deleted + +If you have a subscription that was blocked due to inactivity, see [What happens if my subscription is blocked?](avoid-unused-subscriptions.md#what-happens-if-my-subscription-is-blocked) If your subscription was deleted due to inactivity, it can't be reactivated. You must create a new subscription. + +If you had an Access to Active Azure Directory subscription (MS-AZR-0110P) that was retired and deactivated and you want to reactivate it, Azure support canΓÇÖt reactivate it for you. Microsoft deprecated the subscription offer. The subscription type was used to access the Azure portal. The subscription type is no longer needed because the Azure portal now uses your Microsoft Entra ID identity for access (previously named Azure Active Directory). So, Azure deprecated the subscription offer. You can sign up for new a subscription at [Azure sign up](https://signup.azure.com/signup). + ## Need help? Contact us. If you have questions or need help, [create a support request](https://go.microsoft.com/fwlink/?linkid=2083458).
cost-management-billing	Permission Buy Savings Plan	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/savings-plan/permission-buy-savings-plan.md	Previously updated : 05/07/2024 Last updated : 08/08/2024 Savings plan purchasing for Microsoft Customer Agreement customers is limited to - Users with billing profile contributor permissions or higher can purchase savings plans from Cost Management + Billing > Savings plan experience. No subscription-specific permissions are needed. - Users with subscription owner or savings plan purchaser roles in at least one subscription in the billing profile can purchase savings plans from Home > Savings plan. -Microsoft Customer Agreement customers can limit savings plan purchases to users with billing profile contributor permissions or higher by disabling the Add Savings Plan option in the [Azure portal](https://portal.azure.com/#blade/Microsoft_Azure_GTM/ModernBillingMenuBlade/BillingAccounts). Go to the Policies menu to change settings. - ### Microsoft Partner Agreement partners Partners can use Home > Savings plan in the [Azure portal](https://portal.azure.com/) to purchase savings plans on behalf of their customers.
defender-for-iot	Billing	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/defender-for-iot/organizations/billing.md	As you plan your Microsoft Defender for IoT deployment, you typically want to un To evaluate Defender for IoT, start a free trial as follows: -- For OT networks, use a trial license for 90 days. Deploy one or more Defender for IoT sensors on your network to monitor traffic, analyze data, generate alerts, learn about network risks and vulnerabilities, and more. An OT trial supports a Large site license for 90 days. For more information, see [Start a Microsoft Defender for IoT trial](getting-started.md). +- For OT networks, use a trial license. Deploy one or more Defender for IoT sensors on your network to monitor traffic, analyze data, generate alerts, learn about network risks and vulnerabilities, and more. An OT trial supports a Large site license. For more information, see [Start a Microsoft Defender for IoT trial](getting-started.md). -- For Enterprise IoT networks, use a trial, standalone license for 90 days as an add-on to Microsoft Defender for Endpoint. Trial licenses support 100 devices. For more information, see [Securing IoT devices in the enterprise](concept-enterprise.md) and [Enable Enterprise IoT security with Defender for Endpoint](eiot-defender-for-endpoint.md). +- For Enterprise IoT networks, use a trial, standalone license as an add-on to Microsoft Defender for Endpoint. Trial licenses support 100 devices. For more information, see [Securing IoT devices in the enterprise](concept-enterprise.md) and [Enable Enterprise IoT security with Defender for Endpoint](eiot-defender-for-endpoint.md). ## Defender for IoT devices
defender-for-iot	Getting Started	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/defender-for-iot/organizations/getting-started.md	Last updated 03/25/2024 This article describes how to set up a trial license and create an initial OT plan for Microsoft Defender for IoT, for customers who don't have any Microsoft tenant or Azure subscription at all. Use Defender for IoT to monitor network traffic across your OT networks. -A trial supports a Large site size with up to 1,000 devices, and lasts for 90 days. You might want to use this trial with a [virtual sensor](tutorial-onboarding.md) or on-premises sensors to monitor traffic, analyze data, generate alerts, understand network risks and vulnerabilities, and more. +A trial supports a Large site size with up to 1,000 devices. You might want to use this trial with a [virtual sensor](tutorial-onboarding.md) or on-premises sensors to monitor traffic, analyze data, generate alerts, understand network risks and vulnerabilities, and more. For more information, see [Free trial](billing.md#free-trial).
defender-for-iot	Whats New	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/defender-for-iot/organizations/whats-new.md	Defender for IoT EIoT monitoring is now automatically supported as part of the M - Customers with legacy Enterprise IoT plans and no ME5/E5 Security plans can continue to use their existing plans until the plans expire. -Trial licenses are available for Defender for Endpoint P2 customers as standalone licenses. Trial licenses support 100 number of devices for 90 days. +Trial licenses are available for Defender for Endpoint P2 customers as standalone licenses. Trial licenses support 100 number of devices. For more information, see: These features are now Generally Available (GA). Updates include the general loo - The Data mining page now includes reporting functionality. While the Reports page was removed, users with read-only access can view updates on the Data mining page without the ability to modify reports or settings. - For admin users creating new reports, you can now toggle on a Send to CM option to send the report to a central management console as well. For more information, see [Create a report](how-to-create-data-mining-queries.md#create-an-ot-sensor-custom-data-mining-report) + For admin users creating new reports, you can now toggle on a Send to CM option to send the report to a central management console as well. For more information, see [Create a report](how-to-create-data-mining-queries.md#create-an-ot-sensor-custom-data-mining-report) - The System settings area has been reorganized in to sections for Basic settings, settings for Network monitoring, Sensor management, Integrations, and Import settings.
expressroute	About Fastpath	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/expressroute/about-fastpath.md	ExpressRoute virtual network gateway is designed to exchange network routes and ### Circuits -FastPath is available on all ExpressRoute circuits. Support for virtual network peering and UDR over FastPath is now generally available and only for connections associated to ExpressRoute Direct circuits. Limited general availability (GA) support for Private Endpoint/Private Link connectivity is only available for connections associated to ExpressRoute Direct circuits. +FastPath is available on all ExpressRoute circuits. Support for virtual network peering and UDR over FastPath is now generally available in all regions and only for connections associated to ExpressRoute Direct circuits. Limited general availability (GA) support for Private Endpoint/Private Link connectivity is only available for connections associated to ExpressRoute Direct circuits. ### Gateways While FastPath supports most configurations, it doesn't support the following fe ## Limited General Availability (GA) -FastPath support for Private Endpoint/Private Link connectivity is available for limited scenarios for 100/10Gbps ExpressRoute Direct connections. Virtual Network Peering and UDR support are available globally across all Azure regions. Private Endpoint/ Private Link connectivity is available in the following Azure regions: +FastPath support for Private Endpoint/Private Link connectivity is available for limited scenarios for 100/10Gbps ExpressRoute Direct connections. Private Endpoint/ Private Link connectivity is available in the following Azure regions: - Australia East - East Asia - East US
hdinsight	Apache Hadoop Use Sqoop Mac Linux	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/hdinsight/hadoop/apache-hadoop-use-sqoop-mac-linux.md	From SQL to Azure storage. * SQL Server must be configured to allow SQL authentication. For more information, see the [Choose an Authentication Mode](/sql/relational-databases/security/choose-an-authentication-mode) document. -* You may have to configure SQL Server to accept remote connections. For more information, see the [How to troubleshoot connecting to the SQL Server database engine](https://fleet-maintenance.com/public_downloads/How%20to%20Troubleshoot%20Connecting%20to%20the%20SQL%20Server%20Database%20Engine.pdf) document. +* You may have to configure SQL Server to accept remote connections. ## Next steps
iot-edge	How To Connect Downstream Device	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/how-to-connect-downstream-device.md	description: How to configure downstream devices to connect to Azure IoT Edge ga Previously updated : 06/10/2024 Last updated : 08/07/2024 Acquire the following to prepare your downstream device: * A root CA certificate file. - This file was used to generate the device CA certificate in [Configure an IoT Edge device to act as a transparent gateway](how-to-create-transparent-gateway.md), which is available on your downstream device. + This file was used to generate the Edge CA certificate in [Configure an IoT Edge device to act as a transparent gateway](how-to-create-transparent-gateway.md), which is available on your downstream device. Your downstream device uses this certificate to validate the identity of the gateway device. This trusted certificate validates the transport layer security (TLS) connections to the gateway device. See usage details in the [Provide the root CA certificate](#provide-the-root-ca-certificate) section.
iot-edge	How To Connect Downstream Iot Edge Device	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/how-to-connect-downstream-iot-edge-device.md	description: How to create a trusted connection between an IoT Edge gateway and Previously updated : 05/15/2024 Last updated : 08/07/2024 Additional device-identity commands, including `add-children`,`list-children`, a ## Generate certificates -A consistent chain of certificates must be installed across devices in the same gateway hierarchy to establish a secure communication between themselves. Every device in the hierarchy, whether an IoT Edge device or an IoT downstream device, needs a copy of the same root CA certificate. Each IoT Edge device in the hierarchy then uses that root CA certificate as the root for its device CA certificate. +A consistent chain of certificates must be installed across devices in the same gateway hierarchy to establish a secure communication between themselves. Every device in the hierarchy, whether an IoT Edge device or an IoT downstream device, needs a copy of the same root CA certificate. Each IoT Edge device in the hierarchy then uses that root CA certificate as the root for its Edge CA certificate. With this setup, each downstream IoT Edge device can verify the identity of their parent by verifying that the edgeHub they connect to has a server certificate that is signed by the shared root CA certificate. For more information about IoT Edge certificate requirements, see * A root CA certificate, which is the topmost shared certificate for all the devices in a given gateway hierarchy. This certificate is installed on all devices. * Any intermediate certificates that you want to include in the root certificate chain. - * A device CA certificate and its private key, generated by the root and intermediate certificates. You need one unique device CA certificate for each IoT Edge device in the gateway hierarchy. + * An Edge CA certificate and its private key, generated by the root and intermediate certificates. You need one unique Edge CA certificate for each IoT Edge device in the gateway hierarchy. You can use either a self-signed certificate authority or purchase one from a trusted commercial certificate authority like Baltimore, Verisign, Digicert, or GlobalSign. -01. If you don't have your own certificates to use for test, create one set of root and intermediate certificates, then create IoT Edge device CA certificates for each device. In this article, we'll use test certificates generated using [test CA certificates for samples and tutorials](https://github.com/Azure/iotedge/tree/main/tools/CACertificates). +01. If you don't have your own certificates to use for test, create one set of root and intermediate certificates, then create Edge CA certificates for each device. In this article, we'll use test certificates generated using [test CA certificates for samples and tutorials](https://github.com/Azure/iotedge/tree/main/tools/CACertificates). For example, the following commands create a root CA certificate, a parent device certificate, and a child device certificate. ```bash For more information on installing certificates on a device, see [Manage certifi To configure your parent device, open a local or remote command shell. -To enable secure connections, every IoT Edge parent device in a gateway scenario needs to be configured with a unique device CA certificate and a copy of the root CA certificate shared by all devices in the gateway hierarchy. +To enable secure connections, every IoT Edge parent device in a gateway scenario needs to be configured with a unique Edge CA certificate and a copy of the root CA certificate shared by all devices in the gateway hierarchy. 01. Check your certificates meet the [format requirements](how-to-manage-device-certificates.md#format-requirements). -01. Transfer the root CA certificate, parent device CA certificate, and parent private key to the parent device. +01. Transfer the root CA certificate, parent Edge CA certificate, and parent private key to the parent device. 01. Copy the certificates and keys to the correct directories. The preferred directories for device certificates are `/var/aziot/certs` for the certificates and `/var/aziot/secrets` for keys. To verify the hostname, you need to inspect the environment variables of the * To configure your downstream device, open a local or remote command shell. -To enable secure connections, every IoT Edge downstream device in a gateway scenario needs to be configured with a unique device CA certificate and a copy of the root CA certificate shared by all devices in the gateway hierarchy. +To enable secure connections, every IoT Edge downstream device in a gateway scenario needs to be configured with a unique Edge CA certificate and a copy of the root CA certificate shared by all devices in the gateway hierarchy. 01. Check your certificates meet the [format requirements](how-to-manage-device-certificates.md#format-requirements). -01. Transfer the root CA certificate, child device CA certificate, and child private key to the downstream device. +01. Transfer the root CA certificate, child Edge CA certificate, and child private key to the downstream device. 01. Copy the certificates and keys to the correct directories. The preferred directories for device certificates are `/var/aziot/certs` for the certificates and `/var/aziot/secrets` for keys.
iot-edge	How To Create Test Certificates	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/how-to-create-test-certificates.md	Device identity certificates go in the Provisioning section of the config fi -## Create edge CA certificates +## Create Edge CA certificates -These certificates are required for gateway scenarios because the edge CA certificate is how the IoT Edge device verifies its identity to downstream devices. You can skip this section if you're not connecting any downstream devices to IoT Edge. +These certificates are required for gateway scenarios because the Edge CA certificate is how the IoT Edge device verifies its identity to downstream devices. You can skip this section if you're not connecting any downstream devices to IoT Edge. -The edge CA certificate is also responsible for creating certificates for modules running on the device, but IoT Edge runtime can create temporary certificates if edge CA isn't configured. Edge CA certificates go in the Edge CA section of the `config.toml` file on the IoT Edge device. To learn more, see [Understand how Azure IoT Edge uses certificates](iot-edge-certs.md). +The Edge CA certificate is also responsible for creating certificates for modules running on the device, but IoT Edge runtime can create temporary certificates if Edge CA isn't configured. Edge CA certificates go in the Edge CA section of the `config.toml` file on the IoT Edge device. To learn more, see [Understand how Azure IoT Edge uses certificates](iot-edge-certs.md). # [Windows](#tab/windows)
iot-edge	How To Create Transparent Gateway	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/how-to-create-transparent-gateway.md	description: Use an Azure IoT Edge device as a transparent gateway that can proc Previously updated : 06/03/2024 Last updated : 08/07/2024 If you don't have a device ready, you should create one before continuing with t -## Set up the device CA certificate +## Set up the Edge CA certificate -All IoT Edge gateways need a device CA certificate installed on them. The IoT Edge security daemon uses the IoT Edge device CA certificate to sign a workload CA certificate, which in turn signs a server certificate for IoT Edge hub. The gateway presents its server certificate to the downstream device during the initiation of the connection. The downstream device checks to make sure that the server certificate is part of a certificate chain that rolls up to the root CA certificate. This process allows the downstream device to confirm that the gateway comes from a trusted source. For more information, see [Understand how Azure IoT Edge uses certificates](iot-edge-certs.md). +All IoT Edge gateways need an Edge CA certificate installed on them. The IoT Edge security daemon uses the Edge CA certificate to sign a workload CA certificate, which in turn signs a server certificate for IoT Edge hub. The gateway presents its server certificate to the downstream device during the initiation of the connection. The downstream device checks to make sure that the server certificate is part of a certificate chain that rolls up to the root CA certificate. This process allows the downstream device to confirm that the gateway comes from a trusted source. For more information, see [Understand how Azure IoT Edge uses certificates](iot-edge-certs.md). :::image type="content" source="./media/how-to-create-transparent-gateway/gateway-setup.png" alt-text="Screenshot that shows the gateway certificate setup." lightbox="./media/how-to-create-transparent-gateway/gateway-setup.png"::: -The root CA certificate and the device CA certificate (with its private key) need to be present on the IoT Edge gateway device and configured in the IoT Edge config file. Remember that in this case root CA certificate means the topmost certificate authority for this IoT Edge scenario. The gateway device CA certificate and the downstream device certificates need to roll up to the same root CA certificate. +The root CA certificate and the Edge CA certificate (with its private key) need to be present on the IoT Edge gateway device and configured in the IoT Edge config file. Remember that in this case root CA certificate means the topmost certificate authority for this IoT Edge scenario. The gateway Edge CA certificate and the downstream device certificates need to roll up to the same root CA certificate. >[!TIP] ->The process of installing the root CA certificate and device CA certificate on an IoT Edge device is also explained in more detail in [Manage certificates on an IoT Edge device](how-to-manage-device-certificates.md). +>The process of installing the root CA certificate and Edge CA certificate on an IoT Edge device is also explained in more detail in [Manage certificates on an IoT Edge device](how-to-manage-device-certificates.md). Have the following files ready: * Root CA certificate -* Device CA certificate +* Edge CA certificate * Device CA private key For production scenarios, you should generate these files with your own certificate authority. For development and test scenarios, you can use demo certificates. If you don't have your own certificate authority and want to use demo certificat 1. To start, set up the scripts for generating certificates on your device. 1. Create a root CA certificate. At the end of those instructions, you'll have a root CA certificate file `<path>/certs/azure-iot-test-only.root.ca.cert.pem`. -1. Create IoT Edge device CA certificates. At the end of those instructions, you'll have a device CA certificate `<path>/certs/iot-edge-device-ca-<cert name>-full-chain.cert.pem` its private key `<path>/private/iot-edge-device-ca-<cert name>.key.pem`. +1. Create Edge CA certificates. At the end of those instructions, you'll have an Edge CA certificate `<path>/certs/iot-edge-device-ca-<cert name>-full-chain.cert.pem` its private key `<path>/private/iot-edge-device-ca-<cert name>.key.pem`. ### Copy certificates to device For more information on the following commands, see [PowerShell functions for Io 1. Copy the certificates to the EFLOW virtual machine to a directory where you have write access. For example, the `/home/iotedge-user` home directory. ```powershell - # Copy the IoT Edge device CA certificate and key + # Copy the Edge CA certificate and key Copy-EflowVMFile -fromFile <path>\certs\iot-edge-device-ca-<cert name>-full-chain.cert.pem -toFile ~/iot-edge-device-ca-<cert name>-full-chain.cert.pem -pushFile Copy-EflowVMFile -fromFile <path>\private\iot-edge-device-ca-<cert name>.key.pem -toFile ~/iot-edge-device-ca-<cert name>.key.pem -pushFile For more information on the following commands, see [PowerShell functions for Io 1. Move the certificates and keys to the preferred `/var/aziot` directory. ```bash - # Move the IoT Edge device CA certificate and key to preferred location + # Move the Edge CA certificate and key to preferred location sudo mv ~/azure-iot-test-only.root.ca.cert.pem /var/aziot/certs sudo mv ~/iot-edge-device-ca-<cert name>-full-chain.cert.pem /var/aziot/certs sudo mv ~/iot-edge-device-ca-<cert name>.key.pem /var/aziot/secrets For more information on the following commands, see [PowerShell functions for Io 1. Find the `trust_bundle_cert` parameter. Uncomment this line and provide the file URI to the root CA certificate file on your device. 1. Find the `[edge_ca]` section of the file. Uncomment the three lines in this section and provide the file URIs to your certificate and key files as values for the following properties: - * cert: device CA certificate + * cert: Edge CA certificate * pk: device CA private key 1. Save and close the file.
iot-edge	Iot Edge Certs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/iot-edge-certs.md	Title: Understand how IoT Edge uses certificates for security description: How Azure IoT Edge uses certificate to validate devices, modules, and downstream devices enabling secure connections between them. -+ - Previously updated : 07/05/2023+ Last updated : 08/07/2024 In a typical manufacturing process for creating secure devices, root CA certific * Multiple companies involved serially in the production of a device * A customer buying a root CA and deriving a signing certificate for the manufacturer to sign the devices they make on that customer's behalf -In any case, the manufacturer uses an intermediate CA certificate at the end of this chain to sign the edge CA certificate placed on the end device. These intermediate certificates are closely guarded at the manufacturing plant. They undergo strict processes, both physical and electronic for their usage. +In any case, the manufacturer uses an intermediate CA certificate at the end of this chain to sign the Edge CA certificate placed on the end device. These intermediate certificates are closely guarded at the manufacturing plant. They undergo strict processes, both physical and electronic for their usage. ## Next steps
iot-edge	Iot Edge Limits And Restrictions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/iot-edge-limits-and-restrictions.md	IoT Hub has the following restrictions for IoT Edge automatic deployments: IoT Edge certificates have the following restrictions: * The common name (CN) can't be the same as the hostname that is used in the configuration file on the IoT Edge device. -* The name used by clients to connect to IoT Edge can't be the same as the common name used in the edge CA certificate. +* The name used by clients to connect to IoT Edge can't be the same as the common name used in the Edge CA certificate. For more information, see [Certificates for device security](iot-edge-certs.md).
iot-edge	Production Checklist	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/production-checklist.md	description: Ready your Azure IoT Edge solution for production. Learn how to set Previously updated : 06/13/2024 Last updated : 08/07/2024 IoT Edge devices can be anything from a Raspberry Pi to a laptop to a virtual ma ### Install production certificates -Every IoT Edge device in production needs a device certificate authority (CA) certificate installed on it. That CA certificate is then declared to the IoT Edge runtime in the config file. For development and testing scenarios, the IoT Edge runtime creates temporary certificates if no certificates are declared in the config file. However, these temporary certificates expire after three months and aren't secure for production scenarios. For production scenarios, you should provide your own device CA certificate, either from a self-signed certificate authority or purchased from a commercial certificate authority. +Every IoT Edge device in production needs a device certificate authority (CA) certificate installed on it. That CA certificate is then declared to the IoT Edge runtime in the config file. For development and testing scenarios, the IoT Edge runtime creates temporary certificates if no certificates are declared in the config file. However, these temporary certificates expire after three months and aren't secure for production scenarios. For production scenarios, you should provide your own Edge CA certificate, either from a self-signed certificate authority or purchased from a commercial certificate authority. -To understand the role of the device CA certificate, see [How Azure IoT Edge uses certificates](iot-edge-certs.md). +To understand the role of the Edge CA certificate, see [How Azure IoT Edge uses certificates](iot-edge-certs.md). For more information about how to install certificates on an IoT Edge device and reference them from the config file, see [Manage certificate on an IoT Edge device](how-to-manage-device-certificates.md).
iot-edge	Tutorial Store Data Sql Server	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-edge/tutorial-store-data-sql-server.md	You need to select which architecture you're targeting with each solution, becau } ``` -6. In line 35, replace the string \<sql connection string\> with the following string. The Data Source property references the SQL Server container, which doesn't exist yet. You will create it with the name SQL in the next section. +6. In line 35, replace the string \<sql connection string\> with the following string. The Data Source property references the SQL Server container, which doesn't exist yet. You will create it with the name SQL in the next section. Choose a strong password for the Password keyword. ```csharp - Data Source=tcp:sql,1433;Initial Catalog=MeasurementsDB;User Id=SA;Password=Strong!Passw0rd;TrustServerCertificate=False;Connection Timeout=30; + Data Source=tcp:sql,1433;Initial Catalog=MeasurementsDB;User Id=SA;Password=<YOUR-STRONG-PASSWORD>;TrustServerCertificate=False;Connection Timeout=30; ``` 7. Save the sqlFunction.cs file. You can set modules on a device through the IoT Hub, but you can also access you When you apply the deployment manifest to your device, you get three modules running. The SimulatedTemperatureSensor module generates simulated environment data. The sqlFunction module takes the data and formats it for a database. This section guides you through setting up the SQL database to store the temperature data. -Run the following commands on your IoT Edge device. These commands connect to the sql module running on your device and create a database and table to hold the temperature data being sent to it. +Run the following commands on your IoT Edge device. These commands connect to the sql module running on your device and create a database and table to hold the temperature data being sent to it. Replace \<YOUR-STRONG-PASSWORD\> with the strong password you chose in your connection string. 1. In a command-line tool on your IoT Edge device, connect to your database. Run the following commands on your IoT Edge device. These commands connect to th 2. Open the SQL command tool. ```bash - /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P 'Strong!Passw0rd' + /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P '<YOUR-STRONG-PASSWORD>' ``` 3. Create your database:
load-balancer	Inbound Nat Rules	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/inbound-nat-rules.md	An inbound NAT rule is used for port forwarding. Port forwarding lets you connec ## Types of inbound NAT rules -There are two types of inbound NAT rule available for Azure Load Balancer, single virtual machine and multiple virtual machines. +There are two types of inbound NAT rule available for Azure Load Balancer, version 1 and version 2. +>[!NOTE] +> The recommendation is to use Inbound NAT rule V2 for Standard Load Balancer deployments. -### Single virtual machine +### Inbound NAT rule V1 -A single virtual machine inbound NAT rule is defined for a single target virtual machine. The load balancer's frontend IP address and the selected frontend port are used for connections to the virtual machine. +Inbound NAT rule V1 is defined for a single target virtual machine. Inbound NAT pools are feature of Inbound NAT rules V1 and automatically creates Inbound NAT rules per VMSS intance. The load balancer's frontend IP address and the selected frontend port are used for connections to the virtual machine. :::image type="content" source="./media/inbound-nat-rules/inbound-nat-rule.png" alt-text="Diagram of a single virtual machine inbound NAT rule."::: -### Multiple virtual machines and virtual machine scale sets +### Inbound NAT rule V2 A multiple virtual machines inbound NAT rule references the entire backend pool in the rule. A range of frontend ports are preallocated based on the rule settings of Frontend port range start and Maximum number of machines in the backend pool.
machine-learning	Azure Machine Learning Ci Image Release Notes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/azure-machine-learning-ci-image-release-notes.md	Release Notes: Ray: `2.31.0` -nvidia-docker2 +Nvidia-docker2 Tensorflow: `2.15.0` -pandas: `1.3.5` +Pandas: `1.3.5` -libcurl: `8.4.0` +Libcurl: `8.4.0` + +Libzmq5: `4.3.2-2ubuntu1` + +Less: `551-1ubuntu0.3` + +Libgit2: `0.28.4+dfsg.1-2ubuntu0.1` + +Klibc-utils: `2.0.7-1ubuntu5.2` + +Libklibc: `2.0.7-1ubuntu5.2` + +Libc6: `2.31-0ubuntu9.16` + +Linux-image-azure: `5.15.0.1045.52` + +Bind9: `1:9.16.48-0ubuntu0` + +Binutils: `2.34-6ubuntu1.9` + +Binutils-multiarch: `2.34-6ubuntu1.9` + +Libxml2: `2.9.10+dfsg-5ubuntu0` + +Libuv1: `1.34.2-1ubuntu1.5` + +Curl: `7.68.0-1ubuntu2.22` + +Libcurl3-gnutls: `7.68.0-1ubuntu2.22` + +Libcurl3-nss: `7.68.0-1ubuntu2.22` + +Libcurl4: `7.68.0-1ubuntu2.22` + +Util-linux: `2.34-0.1ubuntu9.6` + +Libglib2.0-0: `2.64.6-1~ubuntu20.04.7` + +Libglib2.0-bin: `2.64.6-1~ubuntu20.04.7` + +Gstreamer1.0-plugins-base: `1.16.3-0ubuntu1.3` + +Xserver-xorg-core: `2:1.20.13-1ubuntu1` + +Xwayland: `2:1.20.13-1ubuntu1` + +Libnss3: `2:3.98-0ubuntu0.20.04.2` + +Accountsservice: `0.6.55-0ubuntu12` + +Libaccountsservice0: `0.6.55-0ubuntu12` + +Libssl1.1: `1.1.1f-1ubuntu2.22` + +Libnode64: `10.19.0~dfsg-3ubuntu1.6` + +Nodejs: `10.19.0~dfsg-3ubuntu1.6` + +Libnss3: `2:3.98-0ubuntu0.20.04.2` + +Libgnutls30: `3.6.13-2ubuntu1.11` + +Cpio: `2.13+dfsg-2ubuntu0.4` + +Libtss2-esys0: `2.3.2-1ubuntu0` ## July 3, 2024 PyTorch: `1.13.1` TensorFlow: `2.15.0` -autokeras==`1.0.16` +Autokeras: `1.0.16` -keras=`2.15.0` +Keras: `2.15.0` -ray==`2.2.0` +Ray: `2.2.0` -docker version==`24.0.9-1` +Docker version: `24.0.9-1` ## February 16, 2024 Version: `24.01.30` Main changes: - `Azure Machine Learning SDK` to version `1.49.0` - `Certifi` updated to `2022.9.24` - `.NET` updated from `3.1` (end-of-life) to `6.0`-- `Pyspark` update to `3.3.1` (mitigating log4j 1.2.17 and common-text-1.6 vulnerabilities) +- `Pyspark` update to `3.3.1` (mitigating Log4j 1.2.17 and common-text-1.6 vulnerabilities) - Default `intellisense` to Python `3.10` on the CI - Bug fixes and stability improvements
machine-learning	Concept Customer Managed Keys	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/concept-customer-managed-keys.md	Title: Customer-managed keys description: 'Learn about using customer-managed keys to improve data security with Azure Machine Learning.' -+
machine-learning	Concept Data Encryption	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/concept-data-encryption.md	Title: Data encryption with Azure Machine Learning description: 'Learn how Azure Machine Learning computes and datastores provide data encryption at rest and in transit.' -+
machine-learning	Concept Enterprise Security	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/concept-enterprise-security.md	Title: Enterprise security and governance description: 'Securely use Azure Machine Learning: authentication, authorization, network security, data encryption, and monitoring.' -+
machine-learning	Concept Secure Code Best Practice	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/concept-secure-code-best-practice.md	Title: Secure code best practices description: Learn about potential security threats that exist when developing for Azure Machine Learning, mitigations, and best practices. -+
machine-learning	Concept Secure Network Traffic Flow	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/concept-secure-network-traffic-flow.md	Title: Secure network traffic flow description: Learn how network traffic flows between components when your Azure Machine Learning workspace is in a secured virtual network. -+
machine-learning	Concept Vulnerability Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/concept-vulnerability-management.md	Last updated 01/22/2024 -+
machine-learning	How To Access Azureml Behind Firewall	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-access-azureml-behind-firewall.md	Title: Configure inbound and outbound network traffic description: 'How to configure the required inbound and outbound network traffic when using a secure Azure Machine Learning workspace.' -+
machine-learning	How To Assign Roles	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-assign-roles.md	Title: Manage roles in your workspace description: Learn how to access an Azure Machine Learning workspace using Azure role-based access control (Azure RBAC). -+
machine-learning	How To Autoscale Endpoints	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-autoscale-endpoints.md	Title: Autoscale online endpoints- -description: Learn to scale up online endpoints. Get more CPU, memory, disk space, and extra features. + +description: Learn how to scale up online endpoints in Azure Machine Learning, and get more CPU, memory, disk space, and extra features. Last updated : 08/07/2024 Previously updated : 02/07/2023 +#customer intent: As a developer, I want to autoscale online endpoints in Azure Machine Learning so I can control resource usage in my deployment based on metrics or schedules. -# Autoscale an online endpoint + +# Autoscale online endpoints in Azure Machine Learning [!INCLUDE [dev v2](includes/machine-learning-dev-v2.md)] -Autoscale automatically runs the right amount of resources to handle the load on your application. [Online endpoints](concept-endpoints.md) supports autoscaling through integration with the Azure Monitor autoscale feature. +In this article, you learn to manage resource usage in a deployment by configuring autoscaling based on metrics and schedules. The autoscale process lets you automatically run the right amount of resources to handle the load on your application. [Online endpoints](concept-endpoints.md) in Azure Machine Learning support autoscaling through integration with the autoscale feature in Azure Monitor. -Azure Monitor autoscaling supports a rich set of rules. You can configure metrics-based scaling (for instance, CPU utilization >70%), schedule-based scaling (for example, scaling rules for peak business hours), or a combination. For more information, see [Overview of autoscale in Microsoft Azure](../azure-monitor/autoscale/autoscale-overview.md). +Azure Monitor autoscale allows you to set rules that trigger one or more autoscale actions when conditions of the rules are met. You can configure metrics-based scaling (such as CPU utilization greater than 70%), schedule-based scaling (such as scaling rules for peak business hours), or a combination of the two. For more information, see [Overview of autoscale in Microsoft Azure](../azure-monitor/autoscale/autoscale-overview.md). -Today, you can manage autoscaling using either the Azure CLI, REST, ARM, or the browser-based Azure portal. Other Azure Machine Learning SDKs, such as the Python SDK, will add support over time. +You can currently manage autoscaling by using the Azure CLI, the REST APIs, Azure Resource Manager, the Python SDK, or the browser-based Azure portal. ## Prerequisites -* A deployed endpoint. [Deploy and score a machine learning model by using an online endpoint](how-to-deploy-online-endpoints.md). -* To use autoscale, the role `microsoft.insights/autoscalesettings/write` must be assigned to the identity that manages autoscale. You can use any built-in or custom roles that allow this action. For general guidance on managing roles for Azure Machine Learning, see [Manage users and roles](./how-to-assign-roles.md). For more on autoscale settings from Azure Monitor, see [Microsoft.Insights autoscalesettings](/azure/templates/microsoft.insights/autoscalesettings). +- A deployed endpoint. For more information, see [Deploy and score a machine learning model by using an online endpoint](how-to-deploy-online-endpoints.md). + +- To use autoscale, the role `microsoft.insights/autoscalesettings/write` must be assigned to the identity that manages autoscale. You can use any built-in or custom roles that allow this action. For general guidance on managing roles for Azure Machine Learning, see [Manage users and roles](how-to-assign-roles.md). For more on autoscale settings from Azure Monitor, see [Microsoft.Insights autoscalesettings](/azure/templates/microsoft.insights/autoscalesettings). + +- To use the Python SDK to manage the Azure Monitor service, install the `azure-mgmt-monitor` package with the following command: -## Define an autoscale profile + ```console + pip install azure-mgmt-monitor + ``` -To enable autoscale for an endpoint, you first define an autoscale profile. This profile defines the default, minimum, and maximum scale set capacity. The following example sets the default and minimum capacity as two VM instances, and the maximum capacity as five: +## Define autoscale profile -# [Azure CLI](#tab/azure-cli) +To enable autoscale for an online endpoint, you first define an autoscale profile. The profile specifies the default, minimum, and maximum scale set capacity. The following example shows how to set the number of virtual machine (VM) instances for the default, minimum, and maximum scale capacity. + +# [Azure CLI](#tab/cli) [!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)] -The following snippet sets the endpoint and deployment names: +1. Set the endpoint and deployment names: + :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="set_endpoint_deployment_name" ::: -Next, get the Azure Resource Manager ID of the deployment and endpoint: +1. Get the Azure Resource Manager ID of the deployment and endpoint: + :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="set_other_env_variables" ::: -The following snippet creates the autoscale profile: +1. Create the autoscale profile: + :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="create_autoscale_profile" ::: > [!NOTE] -> For more, see the [reference page for autoscale](/cli/azure/monitor/autoscale) +> For more information, see the [az monitor autoscale](/cli/azure/monitor/autoscale) reference. +# [Python SDK](#tab/python) -# [Python](#tab/python) [!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)] -Import modules: -```python -from azure.ai.ml import MLClient -from azure.identity import DefaultAzureCredential -from azure.mgmt.monitor import MonitorManagementClient -from azure.mgmt.monitor.models import AutoscaleProfile, ScaleRule, MetricTrigger, ScaleAction, Recurrence, RecurrentSchedule -import random -import datetime -``` +1. Import the necessary modules: -Define variables for the workspace, endpoint, and deployment: + ```python + from azure.ai.ml import MLClient + from azure.identity import DefaultAzureCredential + from azure.mgmt.monitor import MonitorManagementClient + from azure.mgmt.monitor.models import AutoscaleProfile, ScaleRule, MetricTrigger, ScaleAction, Recurrence, RecurrentSchedule + import random + import datetime + ``` -```python -subscription_id = "<YOUR-SUBSCRIPTION-ID>" -resource_group = "<YOUR-RESOURCE-GROUP>" -workspace = "<YOUR-WORKSPACE>" +1. Define variables for the workspace, endpoint, and deployment: -endpoint_name = "<YOUR-ENDPOINT-NAME>" -deployment_name = "blue" -``` + ```python + subscription_id = "<YOUR-SUBSCRIPTION-ID>" + resource_group = "<YOUR-RESOURCE-GROUP>" + workspace = "<YOUR-WORKSPACE>" -Get Azure Machine Learning and Azure Monitor clients: + endpoint_name = "<YOUR-ENDPOINT-NAME>" + deployment_name = "blue" + ``` -```python -credential = DefaultAzureCredential() -ml_client = MLClient( - credential, subscription_id, resource_group, workspace -) +1. Get Azure Machine Learning and Azure Monitor clients: -mon_client = MonitorManagementClient( - credential, subscription_id -) -``` + ```python + credential = DefaultAzureCredential() + ml_client = MLClient( + credential, subscription_id, resource_group, workspace + ) -Get the endpoint and deployment objects: + mon_client = MonitorManagementClient( + credential, subscription_id + ) + ``` -```python -deployment = ml_client.online_deployments.get( - deployment_name, endpoint_name -) +1. Get the endpoint and deployment objects: -endpoint = ml_client.online_endpoints.get( - endpoint_name -) -``` + ```python + deployment = ml_client.online_deployments.get( + deployment_name, endpoint_name + ) -Create an autoscale profile: + endpoint = ml_client.online_endpoints.get( + endpoint_name + ) + ``` + +1. Create an autoscale profile: + + ```python + # Set a unique name for autoscale settings for this deployment. The following code appends a random number to create a unique name. + autoscale_settings_name = f"autoscale-{endpoint_name}-{deployment_name}-{random.randint(0,1000)}" + + mon_client.autoscale_settings.create_or_update( + resource_group, + autoscale_settings_name, + parameters = { + "location" : endpoint.location, + "target_resource_uri" : deployment.id, + "profiles" : [ + AutoscaleProfile( + name="my-scale-settings", + capacity={ + "minimum" : 2, + "maximum" : 5, + "default" : 2 + }, + rules = [] + ) + ] + } + ) + ``` -```python -# Set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique. -autoscale_settings_name = f"autoscale-{endpoint_name}-{deployment_name}-{random.randint(0,1000)}" +# [Studio](#tab/azure-studio) -mon_client.autoscale_settings.create_or_update( - resource_group, - autoscale_settings_name, - parameters = { - "location" : endpoint.location, - "target_resource_uri" : deployment.id, - "profiles" : [ - AutoscaleProfile( - name="my-scale-settings", - capacity={ - "minimum" : 2, - "maximum" : 5, - "default" : 2 - }, - rules = [] - ) - ] - } -) -``` +1. In [Azure Machine Learning studio](https://ml.azure.com), go to your workspace, and select __Endpoints__ from the left menu. -# [Studio](#tab/azure-studio) +1. In the list of available endpoints, select the endpoint to configure: + + :::image type="content" source="media/how-to-autoscale-endpoints/select-endpoint.png" alt-text="Screenshot that shows how to select an endpoint deployment entry for a Machine Learning workspace in the studio." lightbox="media/how-to-autoscale-endpoints/select-endpoint.png"::: -In [Azure Machine Learning studio](https://ml.azure.com), select your workspace and then select __Endpoints__ from the left side of the page. Once the endpoints are listed, select the one you want to configure. +1. On the __Details__ tab for the selected endpoint, select __Configure auto scaling__: + :::image type="content" source="media/how-to-autoscale-endpoints/configure-auto-scaling.png" alt-text="Screenshot that shows how to select the option to configure autoscaling for an endpoint." lightbox="media/how-to-autoscale-endpoints/configure-auto-scaling.png"::: -From the __Details__ tab for the endpoint, select __Configure auto scaling__. +1. For the __Choose how to scale your resources__ option, select __Custom autoscale__ to begin the configuration. +1. For the __Default__ scale condition option, configure the following values: -Under __Choose how to scale your resources__, select __Custom autoscale__ to begin the configuration. For the default scale condition, use the following values: + - __Scale mode__: Select __Scale based on a metric__. + - __Instance limits__ > __Minimum__: Set the value to 2. + - __Instance limits__ > __Maximum__: Set the value to 5. + - __Instance limits__ > __Default__: Set the value to 2. -* Set __Scale mode__ to __Scale based on a metric__. -* Set __Minimum__ to __2__. -* Set __Maximum__ to __5__. -* Set __Default__ to __2__. + :::image type="content" source="media/how-to-autoscale-endpoints/choose-custom-autoscale.png" alt-text="Screenshot that shows how to configure the autoscale settings in the studio." lightbox="media/how-to-autoscale-endpoints/choose-custom-autoscale.png"::: +Leave the configuration pane open. In the next section, you configure the __Rules__ settings. -## Create a rule to scale out using deployment metrics +## Create scale-out rule based on deployment metrics -A common scaling out rule is one that increases the number of VM instances when the average CPU load is high. The following example will allocate two more nodes (up to the maximum) if the CPU average a load of greater than 70% for five minutes:: +A common scale-out rule is to increase the number of VM instances when the average CPU load is high. The following example shows how to allocate two more nodes (up to the maximum) if the CPU average load is greater than 70% for 5 minutes: -# [Azure CLI](#tab/azure-cli) +# [Azure CLI](#tab/cli) [!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)] :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="scale_out_on_cpu_util" ::: -The rule is part of the `my-scale-settings` profile (`autoscale-name` matches the `name` of the profile). The value of its `condition` argument says the rule should trigger when "The average CPU consumption among the VM instances exceeds 70% for five minutes." When that condition is satisfied, two more VM instances are allocated. +The rule is part of the `my-scale-settings` profile, where `autoscale-name` matches the `name` portion of the profile. The value of the rule `condition` argument indicates the rule triggers when "The average CPU consumption among the VM instances exceeds 70% for 5 minutes." When the condition is satisfied, two more VM instances are allocated. > [!NOTE] -> For more information on the CLI syntax, see [`az monitor autoscale`](/cli/azure/monitor/autoscale). +> For more information, see the [az monitor autoscale](/cli/azure/monitor/autoscale) Azure CLI syntax reference. +# [Python SDK](#tab/python) -# [Python](#tab/python) [!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)] -Create the rule definition: - -```python -rule_scale_out = ScaleRule( - metric_trigger = MetricTrigger( - metric_name="CpuUtilizationPercentage", - metric_resource_uri = deployment.id, - time_grain = datetime.timedelta(minutes = 1), - statistic = "Average", - operator = "GreaterThan", - time_aggregation = "Last", - time_window = datetime.timedelta(minutes = 5), - threshold = 70 - ), - scale_action = ScaleAction( - direction = "Increase", - type = "ChangeCount", - value = 2, - cooldown = datetime.timedelta(hours = 1) +1. Create the rule definition: + + ```python + rule_scale_out = ScaleRule( + metric_trigger = MetricTrigger( + metric_name="CpuUtilizationPercentage", + metric_resource_uri = deployment.id, + time_grain = datetime.timedelta(minutes = 1), + statistic = "Average", + operator = "GreaterThan", + time_aggregation = "Last", + time_window = datetime.timedelta(minutes = 5), + threshold = 70 + ), + scale_action = ScaleAction( + direction = "Increase", + type = "ChangeCount", + value = 2, + cooldown = datetime.timedelta(hours = 1) + ) ) -) -``` -This rule is refers to the last 5 minute average of `CPUUtilizationpercentage` from the arguments `metric_name`, `time_window` and `time_aggregation`. When value of the metric is greater than the `threshold` of 70, two more VM instances are allocated. + ``` + + This rule refers to the last 5-minute average of the `CPUUtilizationpercentage` value from the arguments `metric_name`, `time_window`, and `time_aggregation`. When the value of the metric is greater than the `threshold` of 70, the deployment allocates two more VM instances. + +1. Update the `my-scale-settings` profile to include this rule: + + ```python + mon_client.autoscale_settings.create_or_update( + resource_group, + autoscale_settings_name, + parameters = { + "location" : endpoint.location, + "target_resource_uri" : deployment.id, + "profiles" : [ + AutoscaleProfile( + name="my-scale-settings", + capacity={ + "minimum" : 2, + "maximum" : 5, + "default" : 2 + }, + rules = [ + rule_scale_out + ] + ) + ] + } + ) + ``` -Update the `my-scale-settings` profile to include this rule: +# [Studio](#tab/azure-studio) -```python -mon_client.autoscale_settings.create_or_update( - resource_group, - autoscale_settings_name, - parameters = { - "location" : endpoint.location, - "target_resource_uri" : deployment.id, - "profiles" : [ - AutoscaleProfile( - name="my-scale-settings", - capacity={ - "minimum" : 2, - "maximum" : 5, - "default" : 2 - }, - rules = [ - rule_scale_out - ] - ) - ] - } -) -``` +The following steps continue with the autoscale configuration. -# [Studio](#tab/azure-studio) +1. For the __Rules__ option, select the __Add a rule__ link. The __Scale rule__ page opens. -In the __Rules__ section, select __Add a rule__. The __Scale rule__ page is displayed. Use the following information to populate the fields on this page: +1. On the __Scale rule__ page, configure the following values: -* Set __Metric name__ to __CPU Utilization Percentage__. -* Set __Operator__ to __Greater than__ and set the __Metric threshold__ to __70__. -* Set __Duration (minutes)__ to 5. Leave the __Time grain statistic__ as __Average__. -* Set __Operation__ to __Increase count by__ and set __Instance count__ to __2__. + - __Metric name__: Select __CPU Utilization Percentage__. + - __Operator__: Set to __Greater than__. + - __Metric threshold__: Set the value to 70. + - __Duration (minutes)__: Set the value to 5. + - __Time grain statistic__: Select __Average__. + - __Operation__: Select __Increase count by__. + - __Instance count__: Set the value to 2. -Finally, select the __Add__ button to create the rule. +1. Select __Add__ to create the rule: + :::image type="content" source="media/how-to-autoscale-endpoints/scale-out-rule.png" lightbox="media/how-to-autoscale-endpoints/scale-out-rule.png" alt-text="Screenshot that shows how to configure the scale-out rule for greater than 70% CPU for 5 minutes."::: + +Leave the configuration pane open. In the next section, you adjust the __Rules__ settings. -## Create a rule to scale in using deployment metrics +## Create scale-in rule based on deployment metrics -When load is light, a scaling in rule can reduce the number of VM instances. The following example will release a single node, down to a minimum of 2, if the CPU load is less than 30% for 5 minutes: +When the average CPU load is light, a scale-in rule can reduce the number of VM instances. The following example shows how to release a single node down to a minimum of two, if the CPU load is less than 30% for 5 minutes. -# [Azure CLI](#tab/azure-cli) +# [Azure CLI](#tab/cli) [!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)] :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="scale_in_on_cpu_util" ::: -# [Python](#tab/python) +# [Python SDK](#tab/python) -Create the rule definition: -```python -rule_scale_in = ScaleRule( - metric_trigger = MetricTrigger( - metric_name="CpuUtilizationPercentage", - metric_resource_uri = deployment.id, - time_grain = datetime.timedelta(minutes = 1), - statistic = "Average", - operator = "LessThan", - time_aggregation = "Last", - time_window = datetime.timedelta(minutes = 5), - threshold = 30 - ), - scale_action = ScaleAction( - direction = "Increase", - type = "ChangeCount", - value = 1, - cooldown = datetime.timedelta(hours = 1) +1. Create the rule definition: + + ```python + rule_scale_in = ScaleRule( + metric_trigger = MetricTrigger( + metric_name="CpuUtilizationPercentage", + metric_resource_uri = deployment.id, + time_grain = datetime.timedelta(minutes = 1), + statistic = "Average", + operator = "LessThan", + time_aggregation = "Last", + time_window = datetime.timedelta(minutes = 5), + threshold = 30 + ), + scale_action = ScaleAction( + direction = "Increase", + type = "ChangeCount", + value = 1, + cooldown = datetime.timedelta(hours = 1) + ) ) -) -``` - -Update the `my-scale-settings` profile to include this rule: - -```python -mon_client.autoscale_settings.create_or_update( - resource_group, - autoscale_settings_name, - parameters = { - "location" : endpoint.location, - "target_resource_uri" : deployment.id, - "profiles" : [ - AutoscaleProfile( - name="my-scale-settings", - capacity={ - "minimum" : 2, - "maximum" : 5, - "default" : 2 - }, - rules = [ - rule_scale_out, - rule_scale_in - ] - ) - ] - } -) -``` + ``` + +1. Update the `my-scale-settings` profile to include this rule: + + ```python + mon_client.autoscale_settings.create_or_update( + resource_group, + autoscale_settings_name, + parameters = { + "location" : endpoint.location, + "target_resource_uri" : deployment.id, + "profiles" : [ + AutoscaleProfile( + name="my-scale-settings", + capacity={ + "minimum" : 2, + "maximum" : 5, + "default" : 2 + }, + rules = [ + rule_scale_out, + rule_scale_in + ] + ) + ] + } + ) + ``` # [Studio](#tab/azure-studio) -In the __Rules__ section, select __Add a rule__. The __Scale rule__ page is displayed. Use the following information to populate the fields on this page: +The following steps adjust the __Rules__ configuration to support a scale in rule. + +1. For the __Rules__ option, select the __Add a rule__ link. The __Scale rule__ page opens. -* Set __Metric name__ to __CPU Utilization Percentage__. -* Set __Operator__ to __Less than__ and the __Metric threshold__ to __30__. -* Set __Duration (minutes)__ to __5__. -* Set __Operation__ to __Decrease count by__ and set __Instance count__ to __1__. +1. On the __Scale rule__ page, configure the following values: -Finally, select the __Add__ button to create the rule. + - __Metric name__: Select __CPU Utilization Percentage__. + - __Operator__: Set to __Less than__. + - __Metric threshold__: Set the value to 30. + - __Duration (minutes)__: Set the value to 5. + - __Time grain statistic__: Select __Average__. + - __Operation__: Select __Decrease count by__. + - __Instance count__: Set the value to 1. +1. Select __Add__ to create the rule: -If you have both scale out and scale in rules, your rules will look similar to the following screenshot. You've specified that if average CPU load exceeds 70% for 5 minutes, 2 more nodes should be allocated, up to the limit of 5. If CPU load is less than 30% for 5 minutes, a single node should be released, down to the minimum of 2. + :::image type="content" source="media/how-to-autoscale-endpoints/scale-in-rule.png" lightbox="media/how-to-autoscale-endpoints/scale-in-rule.png" alt-text="Screenshot that shows how to configure the scale in rule for less than 30% CPU for 5 minutes."::: + If you configure both scale-out and scale-in rules, your rules look similar to the following screenshot. The rules specify that if average CPU load exceeds 70% for 5 minutes, two more nodes should be allocated, up to the limit of five. If CPU load is less than 30% for 5 minutes, a single node should be released, down to the minimum of two. + + :::image type="content" source="media/how-to-autoscale-endpoints/autoscale-rules-final.png" lightbox="media/how-to-autoscale-endpoints/autoscale-rules-final.png" alt-text="Screenshot that shows the autoscale settings including the scale in and scale-out rules."::: + +Leave the configuration pane open. In the next section, you specify other scale settings. -## Create a scaling rule based on endpoint metrics +## Create scale rule based on endpoint metrics -The previous rules applied to the deployment. Now, add a rule that applies to the endpoint. In this example, if the request latency is greater than an average of 70 milliseconds for 5 minutes, allocate another node. +In the previous sections, you created rules to scale in or out based on deployment metrics. You can also create a rule that applies to the deployment endpoint. In this section, you learn how to allocate another node when the request latency is greater than an average of 70 milliseconds for 5 minutes. -# [Azure CLI](#tab/azure-cli) +# [Azure CLI](#tab/cli) [!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)] :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="scale_up_on_request_latency" ::: +# [Python SDK](#tab/python) -# [Python](#tab/python) [!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)] -Create the rule definition: - -```python -rule_scale_out_endpoint = ScaleRule( - metric_trigger = MetricTrigger( - metric_name="RequestLatency", - metric_resource_uri = endpoint.id, - time_grain = datetime.timedelta(minutes = 1), - statistic = "Average", - operator = "GreaterThan", - time_aggregation = "Last", - time_window = datetime.timedelta(minutes = 5), - threshold = 70 - ), - scale_action = ScaleAction( - direction = "Increase", - type = "ChangeCount", - value = 1, - cooldown = datetime.timedelta(hours = 1) +1. Create the rule definition: + + ```python + rule_scale_out_endpoint = ScaleRule( + metric_trigger = MetricTrigger( + metric_name="RequestLatency", + metric_resource_uri = endpoint.id, + time_grain = datetime.timedelta(minutes = 1), + statistic = "Average", + operator = "GreaterThan", + time_aggregation = "Last", + time_window = datetime.timedelta(minutes = 5), + threshold = 70 + ), + scale_action = ScaleAction( + direction = "Increase", + type = "ChangeCount", + value = 1, + cooldown = datetime.timedelta(hours = 1) + ) ) -) + ``` + + This rule's `metric_resource_uri` field now refers to the endpoint rather than the deployment. + +1. Update the `my-scale-settings` profile to include this rule: + + ```python + mon_client.autoscale_settings.create_or_update( + resource_group, + autoscale_settings_name, + parameters = { + "location" : endpoint.location, + "target_resource_uri" : deployment.id, + "profiles" : [ + AutoscaleProfile( + name="my-scale-settings", + capacity={ + "minimum" : 2, + "maximum" : 5, + "default" : 2 + }, + rules = [ + rule_scale_out, + rule_scale_in, + rule_scale_out_endpoint + ] + ) + ] + } + ) + ``` -``` -This rule's `metric_resource_uri` field now refers to the endpoint rather than the deployment. +# [Studio](#tab/azure-studio) -Update the `my-scale-settings` profile to include this rule: +The following steps continue the rule configuration on the __Custom autoscale__ page. -```python -mon_client.autoscale_settings.create_or_update( - resource_group, - autoscale_settings_name, - parameters = { - "location" : endpoint.location, - "target_resource_uri" : deployment.id, - "profiles" : [ - AutoscaleProfile( - name="my-scale-settings", - capacity={ - "minimum" : 2, - "maximum" : 5, - "default" : 2 - }, - rules = [ - rule_scale_out, - rule_scale_in, - rule_scale_out_endpoint - ] - ) - ] - } -) -``` +1. At the bottom of the page, select the __Add a scale condition__ link. -# [Studio](#tab/azure-studio) +1. On the __Scale condition__ page, select __Scale based on metric__, and then select the __Add a rule__ link. The __Scale rule__ page opens. -From the bottom of the page, select __+ Add a scale condition__. +1. On the __Scale rule__ page, configure the following values: -Select __Scale based on metric__, and then select __Add a rule__. The __Scale rule__ page is displayed. Use the following information to populate the fields on this page: + - __Metric source__: Select __Other resource__. + - __Resource type__: Select __Machine Learning online endpoints__. + - __Resource__: Select your endpoint. + - __Metric name__: Select __Request latency__. + - __Operator__: Set to __Greater than__. + - __Metric threshold__: Set the value to 70. + - __Duration (minutes)__: Set the value to 5. + - __Time grain statistic__: Select __Average__. + - __Operation__: Select __Increase count by__. + - __Instance count__: Set the value to 1. -* Set __Metric source__ to __Other resource__. -* Set __Resource type__ to __Machine Learning online endpoints__. -* Set __Resource__ to your endpoint. -* Set __Metric name__ to __Request latency__. -* Set __Operator__ to __Greater than__ and set __Metric threshold__ to __70__. -* Set __Duration (minutes)__ to __5__. -* Set __Operation__ to __Increase count by__ and set __Instance count__ to 1 +1. Select __Add__ to create the rule: + :::image type="content" source="media/how-to-autoscale-endpoints/endpoint-rule.png" lightbox="media/how-to-autoscale-endpoints/endpoint-rule.png" alt-text="Screenshot that shows how to configure a scale rule by using endpoint metrics."::: -## Find supported Metrics IDs +## Find IDs for supported metrics -If you want to use other metrics in code (either CLI or SDK) to set up autoscale rules, see the table in [Available metrics](how-to-monitor-online-endpoints.md#available-metrics). +If you want to use other metrics in code to set up autoscale rules by using the Azure CLI or the SDK, see the table in [Available metrics](how-to-monitor-online-endpoints.md#available-metrics). -## Create scaling rules based on a schedule +## Create scale rule based on schedule -You can also create rules that apply only on certain days or at certain times. In this example, the node count is set to 2 on the weekend. +You can also create rules that apply only on certain days or at certain times. In this section, you create a rule that sets the node count to 2 on the weekends. -# [Azure CLI](#tab/azure-cli) +# [Azure CLI](#tab/cli) [!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)] :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="weekend_profile" ::: -# [Python](#tab/python) +# [Python SDK](#tab/python) + [!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)] ```python mon_client.autoscale_settings.create_or_update( # [Studio](#tab/azure-studio) -From the bottom of the page, select __+ Add a scale condition__. On the new scale condition, use the following information to populate the fields: - -* Select __Scale to a specific instance count__. -* Set the __Instance count__ to __2__. -* Set the __Schedule__ to __Repeat specific days__. -* Set the schedule to __Repeat every__ __Saturday__ and __Sunday__. +The following steps configure the rule with options on the __Custom autoscale__ page in the studio. + +1. At the bottom of the page, select the __Add a scale condition__ link. + +1. On the __Scale condition__ page, select __Scale to a specific instance count__, and then select the __Add a rule__ link. The __Scale rule__ page opens. + +1. On the __Scale rule__ page, configure the following values: + + - __Instance count__: Set the value to 2. + - __Schedule__: Select __Repeat specific days__. + - Set the schedule pattern: Select __Repeat every__ and __Saturday__ and __Sunday__. + +1. Select __Add__ to create the rule: + :::image type="content" source="media/how-to-autoscale-endpoints/schedule-rules.png" lightbox="media/how-to-autoscale-endpoints/schedule-rules.png" alt-text="Screenshot that shows how to create a rule based on a schedule."::: -## Enable or disable autoscaling +## Enable or disable autoscale -You can enable or disable specific autoscale profile. +You can enable or disable a specific autoscale profile. -# [Azure CLI](#tab/azure-cli) +# [Azure CLI](#tab/cli) [!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)] :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="disable_profile" ::: -# [Python](#tab/python) +# [Python SDK](#tab/python) + [!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)] ```python mon_client.autoscale_settings.create_or_update( # [Studio](#tab/azure-studio) -To disable the autoscale profile, simply choose "Manual scale" and Save. -To enable the autoscale profile, simply choose "Custom autoscale". If you have added the autoscale profile before, you'll see them below. You can now click Save to enable it. +- To disable an autoscale profile in use, select __Manual scale__, and then select __Save__. + +- To enable an autoscale profile, select __Custom autoscale__. The studio lists all recognized autoscale profiles for the workspace. Select a profile and then select __Save__ to enable. ## Delete resources -If you are not going to use your deployments, delete them: +If you're not going to use your deployments, delete the resources with the following steps. + +# [Azure CLI](#tab/cli) -# [Azure CLI](#tab/azure-cli) [!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)] :::code language="azurecli" source="~/azureml-examples-main/cli/deploy-moe-autoscale.sh" ID="delete_endpoint" ::: -# [Python](#tab/python) +# [Python SDK](#tab/python) + [!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)] ```python ml_client.online_endpoints.begin_delete(endpoint_name) ``` # [Studio](#tab/azure-studio) -1. Go to the [Azure Machine Learning studio](https://ml.azure.com). -1. In the left navigation bar, select the Endpoints page. -1. Select an endpoint by checking the circle next to the model name. -1. Select Delete. + +1. In [Azure Machine Learning studio](https://ml.azure.com), go to your workspace and select __Endpoints__ from the left menu. + +1. In the list of endpoints, select the endpoint to delete (check the circle next to the model name). + +1. Select __Delete__. Alternatively, you can delete a managed online endpoint directly in the [endpoint details page](how-to-use-managed-online-endpoint-studio.md#view-managed-online-endpoints). -## Next steps - -To learn more about autoscale with Azure Monitor, see the following articles: +## Related content - [Understand autoscale settings](../azure-monitor/autoscale/autoscale-understanding-settings.md)-- [Overview of common autoscale patterns](../azure-monitor/autoscale/autoscale-common-scale-patterns.md)-- [Best practices for autoscale](../azure-monitor/autoscale/autoscale-best-practices.md)-- [Troubleshooting Azure autoscale](../azure-monitor/autoscale/autoscale-troubleshoot.md) +- [Review common autoscale patterns](../azure-monitor/autoscale/autoscale-common-scale-patterns.md) +- [Explore best practices for autoscale](../azure-monitor/autoscale/autoscale-best-practices.md)
machine-learning	How To Change Storage Access Key	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-change-storage-access-key.md	Title: Change storage account access keys description: Learn how to change the access keys for the Azure Storage account used by your workspace. Azure Machine Learning uses an Azure Storage account to store data and models. -+
machine-learning	How To Configure Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-configure-cli.md	Title: 'Install and set up the CLI (v2)' description: Learn how to install and set up the Azure CLI extension for Machine Learning. -+
machine-learning	How To Configure Network Isolation With V2	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-configure-network-isolation-with-v2.md	Title: Network isolation change with our new API platform on Azure Resource Mana description: 'Explain network isolation changes with our new API platform on Azure Resource Manager and how to maintain network isolation' -+
machine-learning	How To Configure Private Link	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-configure-private-link.md	Title: Configure a private endpoint description: 'Use a private endpoint to securely access your Azure Machine Learning workspace from a virtual network.' -+
machine-learning	How To Create Workspace Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-create-workspace-template.md	Title: Create a workspace with Azure Resource Manager template description: Learn how to use an Azure Resource Manager template to create a new Azure Machine Learning workspace. -+
machine-learning	How To Custom Dns	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-custom-dns.md	Title: Use custom DNS server description: How to configure a custom DNS server to work with an Azure Machine Learning workspace and private endpoint. -+
machine-learning	How To Deploy Models Serverless	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-deploy-models-serverless.md	In this article, you learn how to deploy a model from the model catalog as a ser [Certain models in the model catalog](concept-endpoint-serverless-availability.md) can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription. +This article uses a Meta Llama model deployment for illustration. However, you can use the same steps to deploy any of the [models in the model catalog that are available for serverless API deployment](concept-endpoint-serverless-availability.md). + ## Prerequisites - An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin. Serverless API endpoints can deploy both Microsoft and non-Microsoft offered mod # [Studio](#tab/azure-studio) - 1. On the model's Details page, select Deploy and then select Serverless API with Azure AI Content Safety (preview) to open the deployment wizard. + 1. On the model's Details page, select Deploy. A Deployment options window opens up, giving you the choice between serverless API deployment and deployment using a managed compute. + + > [!NOTE] + > For models that can be deployed only via serverless API deployment, the serverless API deployment wizard opens up right after you select Deploy from the model's details page. + 1. Select Serverless API with Azure AI Content Safety (preview) to open the serverless API deployment wizard. 1. Select the checkbox to acknowledge the Microsoft purchase policy. :::image type="content" source="media/how-to-deploy-models-serverless/deploy-pay-as-you-go.png" alt-text="A screenshot showing how to deploy a model with the serverless API option." lightbox="media/how-to-deploy-models-serverless/deploy-pay-as-you-go.png"::: Serverless API endpoints can deploy both Microsoft and non-Microsoft offered mod 1. In the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. You can also select the Pricing and terms tab to learn about pricing for the selected model. - 4. On the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. You can also select the Marketplace offer details tab to learn about pricing for the selected model. + 1. On the deployment wizard, select the link to Azure Marketplace Terms to learn more about the terms of use. You can also select the Marketplace offer details tab to learn about pricing for the selected model. 1. Select Subscribe and Deploy.
machine-learning	How To Enable Studio Virtual Network	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-enable-studio-virtual-network.md	Title: Use Azure Machine Learning studio in a virtual network description: Learn how to configure Azure Machine Learning studio to access data stored inside of a virtual network. -+
machine-learning	How To High Availability Machine Learning	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-high-availability-machine-learning.md	Title: Failover & disaster recovery description: Learn how to plan for disaster recovery and maintain business continuity for Azure Machine Learning. -+
machine-learning	How To Identity Based Service Authentication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-identity-based-service-authentication.md	-+ Last updated 07/26/2024
machine-learning	How To Integrate Azure Policy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-integrate-azure-policy.md	Last updated 04/01/2024 -+
machine-learning	How To Manage Hub Workspace Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-hub-workspace-template.md	Title: Create a hub workspace with Azure Resource Manager template description: Learn how to use a Bicep templates to create a new Azure Machine Learning hub workspace. -+
machine-learning	How To Manage Quotas	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-quotas.md	Title: Manage resources and quotas description: Learn about the quotas and limits on resources for Azure Machine Learning and how to request quota and limit increases. -+
machine-learning	How To Manage Rest	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-rest.md	-+ Last updated 02/02/2024
machine-learning	How To Manage Workspace Terraform	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-workspace-terraform.md	Title: Create workspaces by using Terraform description: Learn how to create Azure Machine Learning workspaces with public or private connectivity by using Terraform. -+
machine-learning	How To Managed Network Compute	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-managed-network-compute.md	Title: Managed computes in managed virtual network isolation description: Use managed compute resources with managed virtual network isolation with Azure Machine Learning. -+
machine-learning	How To Managed Network	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-managed-network.md	Title: Managed virtual network isolation description: Use managed virtual network isolation for network security with Azure Machine Learning. -+
machine-learning	How To Move Workspace	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-move-workspace.md	description: Learn how to move an Azure Machine Learning workspace between Azure -+
machine-learning	How To Network Isolation Planning	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-network-isolation-planning.md	Title: Plan for network isolation description: Demystify Azure Machine Learning network isolation with recommendations and automation templates -+
machine-learning	How To Network Security Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-network-security-overview.md	Title: Secure workspace resources using virtual networks (VNets) description: Secure Azure Machine Learning workspace resources and compute environments using an isolated Azure Virtual Network (VNet). -+
machine-learning	How To Prevent Data Loss Exfiltration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-prevent-data-loss-exfiltration.md	Title: Configure data exfiltration prevention description: 'How to configure data exfiltration prevention for your storage accounts.' -+
machine-learning	How To Private Endpoint Integration Synapse	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-private-endpoint-integration-synapse.md	Title: Securely integrate with Azure Synapse description: 'How to use a virtual network when integrating Azure Synapse with Azure Machine Learning.' -+
machine-learning	How To Responsible Ai Insights Sdk Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-responsible-ai-insights-sdk-cli.md	Title: Generate a Responsible AI insights with YAML and Python description: Learn how to generate a Responsible AI insights with Python and YAML in Azure Machine Learning. -+
machine-learning	How To Secure Inferencing Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-secure-inferencing-vnet.md	Title: Secure inferencing environments with virtual networks description: Use an isolated Azure Virtual Network to secure your Azure Machine Learning inferencing environment. -+
machine-learning	How To Secure Kubernetes Online Endpoint	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-secure-kubernetes-online-endpoint.md	-+ Last updated 10/10/2022
machine-learning	How To Secure Online Endpoint	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-secure-online-endpoint.md	Title: How to secure managed online endpoints with network isolation description: Use private endpoints to provide network isolation for Azure Machine Learning managed online endpoints. -+
machine-learning	How To Secure Training Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-secure-training-vnet.md	Title: Secure training environments with virtual networks description: Use an isolated Azure Virtual Network to secure your Azure Machine Learning training environment. -+
machine-learning	How To Secure Workspace Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-secure-workspace-vnet.md	Title: Secure an Azure Machine Learning workspace with virtual networks description: Use an isolated Azure Virtual Network to secure your Azure Machine Learning workspace and associated resources. -+
machine-learning	How To Securely Attach Databricks	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-securely-attach-databricks.md	Title: Attach a secured Azure Databricks compute description: Use a private endpoint to attach an Azure Databricks compute to an Azure Machine Learning workspace configured for network isolation. -+
machine-learning	How To Setup Authentication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-setup-authentication.md	-+ Last updated 01/05/2024
machine-learning	How To Setup Customer Managed Keys	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-setup-customer-managed-keys.md	Title: Use customer-managed keys description: 'Learn how to improve data security with Azure Machine Learning by using customer-managed keys.' -+ - engagement-fy23
machine-learning	How To Troubleshoot Managed Network	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-troubleshoot-managed-network.md	Title: Troubleshooting managed virtual networks description: Learn how to troubleshoot Azure Machine Learning managed virtual network. -+
machine-learning	How To Troubleshoot Secure Connection Workspace	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-troubleshoot-secure-connection-workspace.md	Title: Troubleshoot private endpoint connection description: 'Learn how to troubleshoot connectivity problems to a workspace that is configured with a private endpoint.' -+
machine-learning	How To Use Batch Model Openai Embeddings	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-use-batch-model-openai-embeddings.md	You can follow along this sample in the following notebooks. In the cloned repos ### Ensure you have an OpenAI deployment -The example shows how to run OpenAI models hosted in Azure OpenAI service. To successfully do it, you need an OpenAI resource correctly deployed in Azure and a deployment for the model you want to use. +The example shows how to run OpenAI models hosted in Azure OpenAI Service. To successfully do it, you need an OpenAI resource correctly deployed in Azure and a deployment for the model you want to use. :::image type="content" source="./media/how-to-use-batch-model-openai-embeddings/aoai-deployments.png" alt-text="An screenshot showing the Azure OpenAI studio with the list of model deployments available.":::
machine-learning	How To Use Secrets In Runs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-use-secrets-in-runs.md	-+ Last updated 01/19/2024
machine-learning	How To Workspace Diagnostic Api	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-workspace-diagnostic-api.md	Title: How to use workspace diagnostics description: Learn how to use Azure Machine Learning workspace diagnostics in the Azure portal or with the Python SDK. -+
machine-learning	Policy Reference	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/policy-reference.md	-+
machine-learning	Reference Model Inference Api	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/reference-model-inference-api.md	import os from azure.ai.inference import ChatCompletionsClient from azure.core.credentials import AzureKeyCredential -model = ChatCompletionsClient( +client = ChatCompletionsClient( endpoint=os.environ["AZUREAI_ENDPOINT_URL"], credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY"]), ) import os from azure.ai.inference import ChatCompletionsClient from azure.identity import AzureDefaultCredential -model = ChatCompletionsClient( +client = ChatCompletionsClient( endpoint=os.environ["AZUREAI_ENDPOINT_URL"], credential=AzureDefaultCredential(), ) const client = new ModelClient( Explore our [samples](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-inference-rest/samples) and read the [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/js/reference) to get yourself started. +# [C#](#tab/csharp) + +Install the Azure AI inference library with the following command: + +```dotnetcli +dotnet add package Azure.AI.Inference --prerelease +``` + +For endpoint with support for Microsoft Entra ID (formerly Azure Active Directory), install the `Azure.Identity` package: + +```dotnetcli +dotnet add package Azure.Identity +``` + +Import the following namespaces: + +```csharp +using Azure; +using Azure.Identity; +using Azure.AI.Inference; +``` + +Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions: + +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL")) +); +``` + +For endpoint with support for Microsoft Entra ID (formerly Azure Active Directory): + +```csharp +ChatCompletionsClient client = new ChatCompletionsClient( + new Uri(Environment.GetEnvironmentVariable("AZURE_INFERENCE_ENDPOINT")), + new DefaultAzureCredential(includeInteractiveCredentials: true) +); +``` + +Explore our [samples](https://aka.ms/azsdk/azure-ai-inference/csharp/samples) and read the [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/csharp/reference) to get yourself started. + # [REST](#tab/rest) Use the reference section to explore the API design and which parameters are available. For example, the reference section for [Chat completions](reference-model-inference-chat-completions.md) details how to use the route `/chat/completions` to generate predictions based on chat-formatted instructions: The following example shows a request passing the parameter `safe_prompt` suppor # [Python](#tab/python) ```python -response = model.complete( +response = client.complete( messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="How many languages are in the world?"), var response = await client.path("/chat/completions").post({ console.log(response.choices[0].message.content) ``` +# [C#](#tab/csharp) + +```csharp +requestOptions = new ChatCompletionsOptions() +{ + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant."), + new ChatRequestUserMessage("How many languages are in the world?") + }, + AdditionalProperties = { { "logprobs", BinaryData.FromString("true") } }, +}; + +response = client.Complete(requestOptions, extraParams: ExtraParameters.PassThrough); +Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}"); +``` + # [REST](#tab/rest) __Request__ from azure.ai.inference.models import SystemMessage, UserMessage, ChatCompletion from azure.core.exceptions import HttpResponseError try: - response = model.complete( + response = client.complete( messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="How many languages are in the world?"), catch (error) { } ``` +# [C#](#tab/csharp) + +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are a helpful assistant"), + new ChatRequestUserMessage("How many languages are in the world?"), + }, + ResponseFormat = new ChatCompletionsResponseFormatJSON() + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.Status == 422) + { + Console.WriteLine($"Looks like the model doesn't support a parameter: {ex.Message}"); + } + else + { + throw; + } +} +``` + # [REST](#tab/rest) __Request__ from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessa from azure.core.exceptions import HttpResponseError try: - response = model.complete( + response = client.complete( messages=[ SystemMessage(content="You are an AI assistant that helps people find information."), UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."), catch (error) { } ``` +# [C#](#tab/csharp) + +```csharp +try +{ + requestOptions = new ChatCompletionsOptions() + { + Messages = { + new ChatRequestSystemMessage("You are an AI assistant that helps people find information."), + new ChatRequestUserMessage( + "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." + ), + }, + }; + + response = client.Complete(requestOptions); + Console.WriteLine(response.Value.Choices[0].Message.Content); +} +catch (RequestFailedException ex) +{ + if (ex.ErrorCode == "content_filter") + { + Console.WriteLine($"Your query has trigger Azure Content Safeaty: {ex.Message}"); + } + else + { + throw; + } +} +``` + # [REST](#tab/rest) __Request__ The client library `@azure-rest/ai-inference` does inference, including chat com Explore our [samples](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-inference-rest/samples) and read the [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/js/reference) to get yourself started. +# [C#](#tab/csharp) + +The client library `Azure.Ai.Inference` does inference, including chat completions, for AI models deployed by Azure AI Studio and Azure Machine Learning Studio. It supports Serverless API endpoints and Managed Compute endpoints (formerly known as Managed Online Endpoints). + +Explore our [samples](https://aka.ms/azsdk/azure-ai-inference/csharp/samples) and read the [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/csharp/reference) to get yourself started. + # [REST](#tab/rest) Explore the reference section of the Azure AI model inference API to see parameters and options to consume models, including chat completions models, deployed by Azure AI Studio and Azure Machine Learning Studio. It supports Serverless API endpoints and Managed Compute endpoints (formerly known as Managed Online Endpoints).
machine-learning	Security Controls Policy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/security-controls-policy.md	-+
machine-learning	How To Machine Learning Fairness Aml	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-machine-learning-fairness-aml.md	Title: Assess ML models' fairness in Python (preview) description: Learn how to assess and mitigate the fairness of your machine learning models using Fairlearn and the Azure Machine Learning Python SDK. -+
machine-learning	How To Machine Learning Interpretability Aml	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-machine-learning-interpretability-aml.md	Title: Use Python to interpret & explain models (preview) description: Learn how to get explanations for how your machine learning model determines feature importance and makes predictions when using the Azure Machine Learning SDK. -+
machine-learning	How To Secure Inferencing Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-secure-inferencing-vnet.md	Title: Secure v1 inferencing environments with virtual networks description: Use an isolated Azure Virtual Network to secure your Azure Machine Learning inferencing environment (v1). -+
machine-learning	How To Secure Training Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-secure-training-vnet.md	Title: Secure training environments with virtual networks (v1) description: Use an isolated Azure Virtual Network to secure your Azure Machine Learning training environment. SDK v1 -+
machine-learning	How To Secure Workspace Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-secure-workspace-vnet.md	Title: Secure an Azure Machine Learning workspace with virtual networks (v1) description: Use an isolated Azure Virtual Network to secure your Azure Machine Learning workspace and associated resources with SDK/CLI v1. -+
machine-learning	How To Setup Authentication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-setup-authentication.md	-+ Last updated 05/31/2024
machine-learning	How To Use Managed Identities	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-use-managed-identities.md	description: Learn how to use CLI and SDK v1 with managed identities to control -+
machine-learning	How To Use Secrets In Runs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-use-secrets-in-runs.md	-+ Last updated 11/16/2022
migrate	Java	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/appcat/java.md	Last updated 07/12/2024 # Azure Migrate application and code assessment for Java -This guide describes how to use the Azure Migrate application and code assessment tool for Java to assess and replatform any type of Java application. The tool enables you to evaluate application readiness for replatforming and migration to Azure. - -The `appcat` command line interface (CLI) is a command-line tool to assess Java application binaries and source code to identify replatforming and migration opportunities for Azure. It helps you modernize and replatform large-scale Java applications by identifying common use cases and code patterns and proposing recommended changes. +This guide describes how to use the Azure Migrate application and code assessment tool for Java to assess and replatform any type of Java application. The tool enables you to evaluate application readiness for replatforming and migration to Azure. This tool is offered as a CLI (command-line interface) and assesses Java application binaries and source code to identify replatforming and migration opportunities for Azure. It helps you modernize and replatform large-scale Java applications by identifying common use cases and code patterns and proposing recommended changes. The tool discovers application technology usage through static code analysis, provides effort estimation, and accelerates code replatforming, helping you to prioritize and move Java applications to Azure. With a set of engines and rules, it can discover and assess different technologies such as Java 11, Java 17, Jakarta EE, Spring, Hibernate, Java Message Service (JMS), and more. It then helps you replatform the Java application to different Azure targets (Azure App Service, Azure Kubernetes Service, Azure Container Apps, and Azure Spring Apps) with specific Azure replatforming rules. When the tool assesses for Cloud Readiness and related Azure services, it can al To use the `appcat` CLI, you must download the ZIP file described in the next section, and have a compatible JDK 11 or JDK 17 installation on your computer. The `appcat` CLI runs on any Java-compatible environment such as Windows, Linux, or Mac, both for Intel, Arm, and Apple Silicon hardware. We recommend you use the [Microsoft Build of OpenJDK](/java/openjdk). -### Download latest release - -The `appcat` CLI is available for download as a ZIP file from [aka.ms/appcat/azure-appcat-cli-latest.zip](https://aka.ms/appcat/azure-appcat-cli-latest.zip). +### Download > [!div class="nextstepaction"] -> [Download Azure Migrate application and code assessment for Java 6.3.0.8](https://aka.ms/appcat/azure-appcat-cli-latest.zip) - -#### Previous release +> [Download Azure Migrate application and code assessment for Java 6.3.0.9](https://aka.ms/appcat/azure-migrate-appcat-for-java-cli-6.3.0.9-preview.zip). Updated on 2024-08-06. +For more information, see the [Release notes](#release-notes) section. #### Known issues Certain rules might not be triggered when parsing specific Lambda expressions. F Running `appcat` in a non-unicode environment with complex double-byte characters will cause corruption. For workarounds, see [the GitHub issue](https://github.com/Azure/appcat-rulesets/issues/183). +#### Previous releases + +The following previous releases are also available for download: + + - [Azure Migrate application and code assessment for Java 6.3.0.8](https://aka.ms/appcat/azure-migrate-appcat-for-java-cli-6.3.0.8-preview.zip). Released on March, 2024. + - [Azure Migrate application and code assessment for Java 6.3.0.7](https://aka.ms/appcat/azure-migrate-appcat-for-java-cli-6.3.0.7-preview.zip). Released on November, 2023. + ### Get started with appcat Unzip the zip file in a folder of your choice. You then get the following directory structure: The complete guide for Rules Development is available at [azure.github.io/appcat ## Release notes +### 6.3.0.9 + +This release contains the following fixes to the known issues previously on 6.3.0.8, and includes a set of new rules. For more information, see below. + +- Resolved an issue with the `localhost-java-00001` rule. +- Introduced new rules for identifying technologies such as AWS S3, AWS SQS, Alibaba Cloud OSS, Alibaba Cloud SMS, Alibaba Scheduler X, Alibaba Cloud Seata, and Alibaba Rocket MQ. +- Updated the `azure-file-system-02000` to now support xml file extensions. +- Upgraded various libraries to address security vulnerabilities. + ### 6.3.0.8 Previously, a set of targets were enabled by default, making it difficult for certain customers to assess large applications with too many incidents related to less critical issues. To reduce noise in reports, users must now specify multiple targets, with the parameter `--target`, when executing `appcat`, giving them the option to select only the targets that matter.
migrate	Migrate Services Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/migrate-services-overview.md	ms. Previously updated : 05/22/2024- Last updated : 08/08/2024+ # About Azure Migrate There are two versions of the Azure Migrate service. ## Next steps -- Try our tutorials to assess [VMware VMs](./tutorial-discover-vmware.md), [Hyper-V VMs](./tutorial-discover-hyper-v.md), or [physical servers](./tutorial-discover-physical.md). +- Try our tutorials to discover [VMware VMs](./tutorial-discover-vmware.md), [Hyper-V VMs](./tutorial-discover-hyper-v.md), or [physical servers](./tutorial-discover-physical.md). - [Review frequently asked questions](resources-faq.md) about Azure Migrate.
migrate	Migrate Support Matrix Hyper V	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/migrate-support-matrix-hyper-v.md	ms. Previously updated : 07/05/2023 -ms.cutom: engagement-fy24 Last updated : 08/08/2024 +ms.cutom: engagement-fy25 # Support matrix for Hyper-V assessment Azure Government \| Agent-based dependency analysis isn't supported. ## Next steps -Prepare for [assessment of servers running on Hyper-V](./tutorial-discover-hyper-v.md). +Prepare for [discovery of servers running on Hyper-V](./tutorial-discover-hyper-v.md).
migrate	Migrate Support Matrix Physical	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/migrate-support-matrix-physical.md	Azure Government \| Agent-based dependency analysis isn't supported. ## Next steps -Prepare for [physical discovery and assessment](./tutorial-discover-physical.md). +Prepare for [discovery](./tutorial-discover-physical.md) of physical servers.
migrate	Migrate Support Matrix	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/migrate-support-matrix.md	Microsoft Azure operated by 21Vianet \| China North 2 There are two versions of the Azure Migrate service: -- Current version: Using this version you can create new projects, discover on-premises assesses, and orchestrate assessments and migrations. [Learn more](whats-new.md).-- Previous version: For customer using, the previous version of Azure Migrate (only assessment of on-premises VMware VMs was supported), you should now use the current version. In the previous version, you can't create new projects or perform new discoveries. +- Current version: Using this version, you can create new projects, discover on-premises servers, and orchestrate assessments and migrations. [Learn more](whats-new.md). +- Previous version: For customers using the previous version of Azure Migrate (only assessment of on-premises VMware VMs was supported), you should now use the current version. In the previous version, you can't create new projects or perform new discoveries. ## Next steps
migrate	Quickstart Create Migrate Project	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/quickstart-create-migrate-project.md	To confirm that the Azure Migrate project was created, use the Azure portal. ## Next steps -In this quickstart, you created an Azure Migrate project. To learn more about Azure Migrate and its capabilities, -continue to the Azure Migrate overview. +In this quickstart, you created an Azure Migrate project. To learn more about Azure Migrate and its capabilities, continue to the Azure Migrate overview. > [!div class="nextstepaction"] > [Azure Migrate overview](migrate-services-overview.md)
migrate	Scale Hyper V Assessment	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/scale-hyper-v-assessment.md	In this article, you: > * Created an Azure Migrate project and ran assessments > * Reviewed assessments in preparation for migration. -Now, [learn how](concepts-assessment-calculation.md) assessments are calculated, and how to [modify assessments](how-to-modify-assessment.md) +Now, [learn how](concepts-assessment-calculation.md) assessments are calculated, and how to [modify assessments](how-to-modify-assessment.md).
migrate	Troubleshoot Appliance	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/troubleshoot-appliance.md	ms. Previously updated : 03/18/2024- Last updated : 08/08/2024+ This error can occur if the appliance is in a shut-down state or the DRA service ## Next steps -Set up an appliance for [VMware](how-to-set-up-appliance-vmware.md), [Hyper-V](how-to-set-up-appliance-hyper-v.md), or [physical servers](how-to-set-up-appliance-physical.md). +- Set up an appliance for [VMware](how-to-set-up-appliance-vmware.md), [Hyper-V](how-to-set-up-appliance-hyper-v.md), or [physical servers](how-to-set-up-appliance-physical.md). +- Learn how to migrate [VMware VMs](./vmware/tutorial-migrate-vmware.md), [Hyper-V VMs](tutorial-migrate-hyper-v.md), or [physical servers](tutorial-migrate-physical-virtual-machines.md). ++
migrate	Tutorial App Containerization Azure Pipeline	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/tutorial-app-containerization-azure-pipeline.md	You'll need to upload the artifacts to a source repository that will be used wit ## Sign in to Azure Pipelines -Sign in to [Azure Pipelines](https://azure.microsoft.com/services/devops/pipelines). After you sign in, your browser goes to https://dev.azure.com/my-organization-name and displays your Azure DevOps dashboard. +Sign in to [Azure Pipelines](https://azure.microsoft.com/services/devops/pipelines). After you sign in, your browser goes to `https://dev.azure.com/my-organization-name` and displays your Azure DevOps dashboard. Within your selected organization, create a project. If you don't have any projects in your organization, you see a Create a project to get started screen. Otherwise, select the Create Project button in the upper-right corner of the dashboard. Now that you've created both the service connections, you can configure your pip 8. Provide the resource ID for the Azure Resource Manager service connection as the value for the dockerRegistryServiceConnection variable. 9. When you're ready, Save to commit the new pipeline into your repo. -Your pipeline is all setup to build and deploy your containerized for Day 2 operations. You can [customize your pipeline](/azure/devops/pipelines/customize-pipeline#prerequisite) to meet your organizational needs. +Your pipeline is all setup to build and deploy your containerized for Day 2 operations. You can [customize your pipeline](/azure/devops/pipelines/customize-pipeline#prerequisite) to meet your organizational needs. + +## Next steps + +- Containerizing Java web apps on Apache Tomcat (on Linux servers) and deploying them on Linux containers on AKS. [Learn more](./tutorial-app-containerization-java-kubernetes.md) +- Containerizing ASP.NET web apps and deploying them on Windows containers on AKS. [Learn more](./tutorial-app-containerization-aspnet-kubernetes.md) +- Containerizing ASP.NET web apps and deploying them on Windows containers on Azure App Service. [Learn more](./tutorial-app-containerization-aspnet-app-service.md)
migrate	Tutorial Assess Aws	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/tutorial-assess-aws.md	Confidence ratings are as follows. - Find server dependencies using [dependency mapping](concepts-dependency-visualization.md). - Set up [agent-based](how-to-create-group-machine-dependencies.md) dependency mapping. +- Migrate [AWS VMs](tutorial-migrate-aws-virtual-machines.md) to Azure.
migrate	Tutorial Assess Gcp	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/tutorial-assess-gcp.md	ms. Previously updated : 02/26/2024 Last updated : 08/08/2024 -+ #Customer intent: As a server admin, I want to assess my GCP instances in preparation for migration to Azure. Confidence ratings are as follows. - Find server dependencies using [dependency mapping](concepts-dependency-visualization.md). - Set up [agent-based](how-to-create-group-machine-dependencies.md) dependency mapping. +- Learn to migrate [GCP VMs](tutorial-migrate-gcp-virtual-machines.md).
migrate	Tutorial Assess Hyper V	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/tutorial-assess-hyper-v.md	ms. Previously updated : 02/26/2024 Last updated : 08/08/2024 -+ #Customer intent: As a Hyper-V admin, I want to assess my Hyper-V VMs in preparation for migration to Azure. Confidence ratings are as follows. - Find server dependencies using [dependency mapping](concepts-dependency-visualization.md). - Set up [agent-based](how-to-create-group-machine-dependencies.md) dependency mapping. +- Migrate [Hyper-V VMs](tutorial-migrate-hyper-v.md).
migrate	Tutorial Assess Physical	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/tutorial-assess-physical.md	ms. Previously updated : 02/26/2024 Last updated : 08/08/2024 -+ #Customer intent: As a server admin, I want to assess my on-premises physical servers in preparation for migration to Azure. Confidence ratings are as follows. - Find server dependencies using [dependency mapping](concepts-dependency-visualization.md). - Set up [agent-based](how-to-create-group-machine-dependencies.md) dependency mapping. +- Migrate [physical servers](tutorial-migrate-physical-virtual-machines.md).
migrate	Tutorial Assess Sap Systems	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/tutorial-assess-sap-systems.md	ms. Previously updated : 03/19/2024- Last updated : 08/08/2024+ To review an assessment, follow these steps: > [!NOTE] > When you update any of the assessment settings, it triggers a new assessment, which takes a few minutes to reflect the updates. +## Next steps +Find server dependencies using [dependency mapping](concepts-dependency-visualization.md). +
migrate	Tutorial Discover Aws	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/tutorial-discover-aws.md	ms. Previously updated : 04/05/2024 Last updated : 08/08/2024 -+ #Customer intent: As a server admin I want to discover my AWS instances. After discovery finishes, you can verify that the servers appear in the portal. ## Next steps -- [Assess physical servers](tutorial-migrate-aws-virtual-machines.md) for migration to Azure VMs. +- [Assess AWS VMs](tutorial-migrate-aws-virtual-machines.md) for migration to Azure. - [Review the data](discovered-metadata.md#collected-data-for-physical-servers) that the appliance collects during discovery.
migrate	Troubleshoot Changed Block Tracking Replication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/migrate/vmware/troubleshoot-changed-block-tracking-replication.md	The following errors occur when VMware snapshot-related operations ΓÇô create, d ### Error Message: An internal error occurred. [Another task is already in progress] -This issue occurs when there are conflicting virtual machine tasks running in the background, or when a task within the vCenter Server times out. Follow the resolution provided in the following [VMware KB](https://go.microsoft.com/fwlink/?linkid=2138891). +This issue occurs when there are conflicting virtual machine tasks running in the background, or when a task within the vCenter Server times out. ### Error Message: An internal error occurred. [Operation not allowed in current state]
mysql	Whats New	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/mysql/flexible-server/whats-new.md	This article summarizes new releases and features in the Azure Database for MySQ - Move from private access (virtual network integrated) network to public access or private link - Azure Database for MySQL flexible server can be transitioned from private access (virtual network Integrated) to public access, with the option to use Private Link. This functionality enables servers to switch from virtual network integrated to Private Link/Public infrastructure seamlessly, without the need to alter the server name or migrate data, simplifying the process for customers. [Learn more](concepts-networking-vnet.md#move-from-private-access-virtual-network-integrated-network-to-public-access-or-private-link) + Azure Database for MySQL Flexible Server can be transitioned from private access (virtual network Integrated) to public access, with the option to use Private Link. This functionality enables servers to switch from virtual network integrated to Private Link/Public infrastructure seamlessly, without the need to alter the server name or migrate data, simplifying the process for customers. [Learn more](concepts-networking-vnet.md#move-from-private-access-virtual-network-integrated-network-to-public-access-or-private-link) + +- Managed HSM support for Azure Database for MySQL Flexible Server (Generally Available) + + We're excited to announce the General Availability (GA) of Azure Key Vault Managed HSM support for Customer Managed Keys (CMK) in Azure Database for MySQL Flexible Server. With Managed HSM you can import your own HSM-backed encryption keys using the CMK bring your own key (BYOK) feature to protect data at rest in your Azure Database for MySQL Flexible Server instances while maintaining data residency and full control of your HSM keys. [Learn more](/azure/mysql/flexible-server/concepts-customer-managed-key) ## May 2024 This article summarizes new releases and features in the Azure Database for MySQ The newly added metrics include MySQL Uptime, MySQL History list length, MySQL Deadlocks, Active Transactions, and MySQL Lock Timeouts. These metrics provide a more detailed view of your server's performance, enabling you to monitor and optimize your database operations more effectively. In addition to these new metrics, we've also improved the Memory percent metric. It now offers more precise calculations of memory usage for the MySQL server (mysqld) process. [Monitor Azure Database for MySQL - Flexible Server](concepts-monitoring.md) -- Microsoft Defender for Cloud supports Azure Database for MySQL flexible server (General Availability) +- Microsoft Defender for Cloud supports Azure Database for MySQL Flexible Server (General Availability) We're excited to announce the general availability of the Microsoft Defender for Cloud feature for Azure Database for MySQL flexible server in all service tiers. The Microsoft Defender Advanced Threat Protection feature simplifies the security management of Azure Database for MySQL flexible server instances. It monitors the server for anomalous or suspicious database activities to detect potential threats and provides security alerts for you to investigate and take appropriate action, allowing you to actively improve the security posture of your database without being a security expert. [What is Microsoft Defender for open-source relational databases](/azure/defender-for-cloud/defender-for-databases-introduction) - On-demand backup and Export (Preview) If you have questions about or suggestions for working with Azure Database for M - [Azure Database for MySQL flexible server pricing](https://azure.microsoft.com/pricing/details/mysql/server/) - [public documentation](index.yml) - [troubleshooting common migration errors](../howto-troubleshoot-common-errors.md)+
open-datasets	Dataset Bing Covid 19	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/open-datasets/dataset-bing-covid-19.md	Bing COVID-19 data includes confirmed, fatal, and recovered cases from all regio Bing collects data from multiple trusted, reliable sources, including: - [BNO News](https://bnonews.com/index.php/2020/04/the-latest-coronavirus-cases/)-- [Centers for Disease Control and Prevention (CDC)](https://www.cdc.gov/coronavirus/2019-ncov/https://docsupdatetracker.net/index.html) +- [Centers for Disease Control and Prevention (CDC)](https://www.cdc.gov/covid/https://docsupdatetracker.net/index.html) - National/regional and state public health departments - [Wikipedia](https://en.wikipedia.org/wiki/2019%E2%80%9320_coronavirus_pandemic) - The [World Health Organization (WHO)](https://www.who.int/emergencies/diseases/novel-coronavirus-2019)
open-datasets	Dataset Human Reference Genomes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/open-datasets/dataset-human-reference-genomes.md	This dataset includes two human-genome references assembled by the [Genome Refer For more information on Hg19 (GRCh37) data, see the [GRCh37 report at NCBI](https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/). -For more information on Hg38 data, see the [GRCh38 report at NCBI](http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/). +For more information on Hg38 data, see the [GRCh38 report at NCBI](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/). Other details about the data can be found at [NCBI RefSeq](https://www.ncbi.nlm.nih.gov/refseq/) site.
operational-excellence	Relocation App Service	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/operational-excellence/relocation-app-service.md	Last updated 07/11/2024-+ - subject-relocation
operational-excellence	Relocation Automation	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/operational-excellence/relocation-automation.md	Last updated 01/19/2024-+ - subject-relocation
operational-excellence	Relocation Event Grid Custom Topics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/operational-excellence/relocation-event-grid-custom-topics.md	description: This article shows you how to move Azure Event Grid custom topics t Last updated 05/14/2024-+ - subject-relocation
operational-excellence	Relocation Event Grid Domains	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/operational-excellence/relocation-event-grid-domains.md	description: This article shows you how to move Azure Event Grid domains to anot Last updated 05/14/2024-+ - subject-relocation
operational-excellence	Relocation Event Grid System Topics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/operational-excellence/relocation-event-grid-system-topics.md	description: This article shows you how to move Azure Event Grid system topics t Last updated 05/14/2024-+ - subject-relocation
operational-excellence	Relocation Postgresql Flexible Server	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/operational-excellence/relocation-postgresql-flexible-server.md	Last updated 02/14/2024-+ - subject-relocation
oracle	Faq Oracle Database Azure	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/oracle/oracle-db/faq-oracle-database-azure.md	Latency between Azure resources and Oracle Database@Azure is within the Azure re ### Does Oracle Database@Azure support deploying Base Database (BD), or do I need to migrate to Autonomous Database service? -No, Base Database isn't currently supported with Oracle Database@Azure. You can deploy single instance self-managed databases on Azure VMs or if you need Oracle managed databases with RAC, we recommend Autonomous Databases via Oracle Database@Azure. For more information, see [Autonomous Database \| Oracle](https://www.oracle.com/cloud/azure/oracle-database-at-azure/) and [Provision Oracle Autonomous Databases \| Microsoft Learn](https://learn.microsoft.com/training/modules/migrate-oracle-workload-azure-odaa/). +No, Base Database isn't currently supported with Oracle Database@Azure. You can deploy single instance self-managed databases on Azure VMs or if you need Oracle managed databases with RAC, we recommend Autonomous Databases via Oracle Database@Azure. For more information, see [Autonomous Database \| Oracle](https://www.oracle.com/cloud/azure/oracle-database-at-azure/) and [Provision Oracle Autonomous Databases \| Microsoft Learn](/training/modules/migrate-oracle-workload-azure-odaa/). ### For the Oracle Database@Azure service, will the automated DBCS DR use Azure backbone or the OCI backbone? On-premises and Cloud](https://www.oracle.com/docs/tech/database/maa-exadata-asm ### Is tiering storage available for the database within Oracle Database@Azure? -Tiering storage service is available as part of Oracle Database@Azure. The Exadata storage servers provide three levels of tiering--PMem, NVME Flash, and HDD. Compression and partitioning are recommended as part of a storage tiering design. For more information, see [Oracle Exadata Cloud Infrastructure X9M Data Sheet](https://www.oracle.com/a/ocom/docs/engineered-systems/exadata-database-machine/exadata-x9m-ds.pdf). +Tiering storage service is available as part of Oracle Database@Azure. The Exadata storage servers provide three levels of tiering--PMem, NVME Flash, and HDD. Compression and partitioning are recommended as part of a storage tiering design. For more information, see [Oracle Exadata Cloud Infrastructure X9M Data Sheet](https://www.oracle.com/a/ocom/docs/engineered-systems/exadata/exadata-cloud-infrastructure-x9m-ds.pdf). ### Where can I go to get more information about capabilities and features corresponding to Oracle Database@Azure? For more information about Oracle Database@Azure, see the following resources. - [Overview - Oracle Database@Azure](/azure/oracle/oracle-db/database-overview) - [Provision and manage Oracle Database@Azure](https://docs.oracle.com/en-us/iaas/Content/multicloud/oaaonboard.htm) - [Oracle Database@Azure support information](https://mylearn.oracle.com/ou/course/oracle-databaseazure-deep-dive/135849)-- [Network planning for Oracle Database@Azure](https://learn.microsoft.com/training/modules/migrate-oracle-workload-azure-odaa/) +- [Network planning for Oracle Database@Azure](/training/modules/migrate-oracle-workload-azure-odaa/) - [Groups and roles for Oracle Database@Azure](https://www.oracle.com/cloud/azure/oracle-database-at-azure/) ## Next steps For more information about Oracle Database@Azure, see the following resources. - [Overview - Oracle Database@Azure](/azure/oracle/oracle-db/database-overview) - [Provision and manage Oracle Database@Azure](https://docs.oracle.com/en-us/iaas/Content/multicloud/oaaonboard.htm) - [Oracle Database@Azure support information](https://mylearn.oracle.com/ou/course/oracle-databaseazure-deep-dive/135849)-- [Network planning for Oracle Database@Azure](https://learn.microsoft.com/training/modules/migrate-oracle-workload-azure-odaa/) +- [Network planning for Oracle Database@Azure](/training/modules/migrate-oracle-workload-azure-odaa/) - [Groups and roles for Oracle Database@Azure](https://www.oracle.com/cloud/azure/oracle-database-at-azure/)
postgresql	Concepts Audit	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-audit.md	Title: Audit logging description: Concepts for pgAudit audit logging in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Concepts Compliance	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-compliance.md	Title: Security and compliance certifications in Azure Database for PostgreSQL - Flexible Server description: Learn about compliance in the Flexible Server deployment option for Azure Database for PostgreSQL.--++ Last updated 04/27/2024
postgresql	Concepts Data Encryption	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-data-encryption.md	Title: Data encryption with a customer-managed key in Azure Database for PostgreSQL - Flexible Server description: Learn how data encryption with a customer-managed key in Azure Database for PostgreSQL - Flexible Server enables you to bring your own key for data protection at rest and allows organizations to implement separation of duties in the management of keys and data.--++ Last updated 06/24/2024
postgresql	Concepts Firewall Rules	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-firewall-rules.md	Title: Firewall rules description: This article describes how to use firewall rules to connect to Azure Database for PostgreSQL - Flexible Server with the public networking deployment option.--++ Last updated 04/27/2024
postgresql	Concepts Geo Disaster Recovery	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-geo-disaster-recovery.md	Title: Geo-disaster recovery description: Learn about the concepts of Geo-disaster recovery with Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Concepts Major Version Upgrade	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-major-version-upgrade.md	Title: Major version upgrades in Azure Database for PostgreSQL - Flexible Server description: Learn how to use Azure Database for PostgreSQL - Flexible Server to do in-place major version upgrades of PostgreSQL on a server. Previously updated : 7/15/2024 Last updated : 8/8/2024 If pre-check operations fail for an in-place major version upgrade, the upgrade - Extensions that serve as dependencies for the following extensions: `postgis`, `postgis_raster`, `postgis_sfcgal`, `postgis_tiger_geocoder`, `postgis_topology`, `address_standardizer`, `address_standardizer_data_us`, `fuzzystrmatch` (required for `postgis_tiger_geocoder`). - Servers configured with logical replication slots aren't supported. +- Servers using SSDv2 storage do not support Major Version Upgrades. ## Next steps
postgresql	Concepts Networking Private Link	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-networking-private-link.md	Title: Networking overview with Private Link connectivity description: Learn about connectivity and networking options for Azure Database for PostgreSQL - Flexible Server with Private Link.--++ Last updated 04/27/2024
postgresql	Concepts Networking Private	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-networking-private.md	Title: Networking overview with private access (virtual network) description: Learn about connectivity and networking options for Azure Database for PostgreSQL - Flexible Server with private access (virtual network).--++ Last updated 06/27/2024
postgresql	Concepts Networking Public	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-networking-public.md	Title: Networking overview with public access (allowed IP addresses) description: Learn about connectivity and networking with public access for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Concepts Networking Ssl Tls	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-networking-ssl-tls.md	Title: Networking overview using SSL and TLS description: Learn about secure connectivity with Flexible Server using SSL and TLS.--++ Last updated 05/02/2024
postgresql	Concepts Read Replicas Geo	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-read-replicas-geo.md	Title: Geo-replication description: This article describes the Geo-replication in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Concepts Read Replicas Promote	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-read-replicas-promote.md	Title: Promote read replicas description: This article describes the promote action for read replica feature in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Concepts Read Replicas Virtual Endpoints	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-read-replicas-virtual-endpoints.md	Title: Virtual endpoints description: This article describes the virtual endpoints for read replica feature in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 6/10/2024
postgresql	Concepts Read Replicas	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-read-replicas.md	Title: Read replicas description: This article describes the read replica feature in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 05/02/2024
postgresql	Concepts Security	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-security.md	Title: Security description: Learn about security in the Flexible Server deployment option for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 08/02/2024
postgresql	Concepts Server Parameters	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-server-parameters.md	Title: Server parameters in Azure Database for PostgreSQL - Flexible Server description: Learn about the server parameters in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 05/16/2024
postgresql	Concepts Storage Extension	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-storage-extension.md	Title: Azure Storage extension in Azure Database for PostgreSQL - Flexible Server description: Learn about the Azure Storage extension in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Concepts Version Policy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-version-policy.md	Title: Versioning policy description: Describes the policy around Postgres major and minor versions in Azure Database for PostgreSQL - Single Server and Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Concepts Workbooks	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/concepts-workbooks.md	Title: Monitor by using Azure Monitor workbooks description: This article describes how you can monitor Azure Database for PostgreSQL - Flexible Server by using Azure Monitor workbooks.--++ Last updated 04/27/2024
postgresql	Connect Azure Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/connect-azure-cli.md	Title: "Quickstart: Connect using Azure CLI" description: This quickstart provides several ways to connect with Azure CLI with Azure Database for PostgreSQL - Flexible Server.--++ Last updated 05/10/2024
postgresql	How To Connect Tls Ssl	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-connect-tls-ssl.md	Title: Encrypted connectivity using TLS/SSL description: Instructions and information on how to connect using TLS/SSL in Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	How To Connect To Data Factory Private Endpoint	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-connect-to-data-factory-private-endpoint.md	Title: Connect from an Azure Data Factory pipeline using Azure Private Link description: This article describes how to connect from Azure Data Factory to an instance of Azure Database for PostgreSQL - Flexible Server using Private Link.--++ Last updated 07/14/2024
postgresql	How To Create Server Customer Managed Key Azure Api	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-create-server-customer-managed-key-azure-api.md	Title: Create and manage with data encrypted by customer managed keys using Azure REST API description: Create and manage Azure Database for PostgreSQL - Flexible Server with data encrypted by Customer Managed Keys using Azure REST API.--++ Last updated 04/27/2024
postgresql	How To Create Server Customer Managed Key Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-create-server-customer-managed-key-cli.md	Title: Create and manage with data encrypted by customer managed keys using the Azure CLI description: Create and manage Azure Database for PostgreSQL - Flexible Server with data encrypted by Customer Managed Keys using the Azure CLI.--++ Last updated 04/27/2024
postgresql	How To Create Server Customer Managed Key Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-create-server-customer-managed-key-portal.md	Title: Create and manage with data encrypted by customer managed keys using Azure portal description: Create and manage Azure Database for PostgreSQL - Flexible Server with data encrypted by Customer Managed Keys using the Azure portal.--++ Last updated 04/27/2024
postgresql	How To Maintenance Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-maintenance-portal.md	Title: Scheduled maintenance - Azure portal description: Learn how to configure scheduled maintenance settings for an Azure Database for PostgreSQL - Flexible Server from the Azure portal.--++ Last updated 04/27/2024
postgresql	How To Manage Firewall Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-manage-firewall-cli.md	Title: Manage firewall rules - Azure CLI description: Create and manage firewall rules for Azure Database for PostgreSQL - Flexible Server using Azure CLI command line.--++ Last updated 04/27/2024
postgresql	How To Manage Virtual Network Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-manage-virtual-network-cli.md	Title: Manage virtual networks - Azure CLI description: Create and manage virtual networks for Azure Database for PostgreSQL - Flexible Server using the Azure CLI.--++ Last updated 04/27/2024
postgresql	How To Manage Virtual Network Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-manage-virtual-network-portal.md	Title: Manage virtual networks - Azure portal description: Create and manage virtual networks for Azure Database for PostgreSQL - Flexible Server using the Azure portal.--++ Last updated 04/27/2024
postgresql	How To Manage Virtual Network Private Endpoint Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-manage-virtual-network-private-endpoint-cli.md	Title: Manage virtual networks with Private Link - CLI description: Create an Azure Database for PostgreSQL - Flexible Server instance with public access by using the Azure CLI, and add private networking to the server based on Azure Private Link.--++ Last updated 04/27/2024
postgresql	How To Manage Virtual Network Private Endpoint Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-manage-virtual-network-private-endpoint-portal.md	Title: Manage virtual networks with Private Link - Azure portal description: Create an Azure Database for PostgreSQL - Flexible Server instance with public access by using the Azure portal, and add private networking to the server based on Azure Private Link.--++ Last updated 04/27/2024
postgresql	How To Read Replicas Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-read-replicas-portal.md	Title: Manage read replicas - Azure portal, REST API description: Learn how to manage read replicas for Azure Database for PostgreSQL - Flexible Server from the Azure portal, CLI, and REST API.--++ Last updated 04/27/2024
postgresql	How To Scale Compute Storage Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-scale-compute-storage-portal.md	Title: Scale operations - Azure portal description: This article describes how to perform scale operations in Azure Database for PostgreSQL - Flexible Server through the Azure portal.--++ Last updated 06/09/2024
postgresql	How To Stop Start Server Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-stop-start-server-cli.md	Title: Stop/start - Azure CLI description: This article describes how to stop/start operations in Azure Database for PostgreSQL - Flexible Server through the Azure CLI.--++ Last updated 04/27/2024
postgresql	How To Update Client Certificates Java	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/how-to-update-client-certificates-java.md	Title: Updating Client SSL/TLS Certificates for Java description: Learn about updating Java clients with Flexible Server using TLS.--++ Last updated 04/27/2024
postgresql	Quickstart Create Connect Server Vnet	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/quickstart-create-connect-server-vnet.md	Title: Connect with private access in the Azure portal description: This article shows how to create and connect to Azure Database for PostgreSQL - Flexible Server with private access or virtual network using the Azure portal.--++ Last updated 04/27/2024
postgresql	Quickstart Create Server Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/quickstart-create-server-cli.md	Title: "Quickstart: Create with Azure CLI" description: This quickstart describes how to use the Azure CLI to create an Azure Database for PostgreSQL - Flexible Server instance in an Azure resource group.--++ Last updated 04/27/2024
postgresql	Quickstart Create Server Python Sdk	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/quickstart-create-server-python-sdk.md	Title: "Quickstart: Create with Azure libraries (SDK) for Python" description: In this Quickstart, learn how to create an Azure Database for PostgreSQL - Flexible Server instance using Azure libraries (SDK) for Python.--++ Last updated 04/27/2024
postgresql	Reference Pg Azure Storage	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/reference-pg-azure-storage.md	Title: Copy data with Azure Storage Extension on Azure Database for PostgreSQL. description: Copy, export or read data from Azure Blob Storage with the Azure Storage extension for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 04/27/2024
postgresql	Release Notes Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/release-notes-cli.md	Title: CLI release notes description: Azure CLI release notes for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 5/1/2024
postgresql	Server Parameters Table Autovacuum	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-autovacuum.md	Title: Autovacuum server parameters description: Autovacuum server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Client Connection Defaults Locale And Formatting	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-client-connection-defaults---locale-and-formatting.md	Title: Client Connection Defaults / Locale and Formatting server parameters description: Client Connection Defaults / Locale and Formatting server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Client Connection Defaults Other Defaults	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-client-connection-defaults---other-defaults.md	Title: Client Connection Defaults / Other Defaults server parameters description: Client Connection Defaults / Other Defaults server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Client Connection Defaults Shared Library Preloading	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-client-connection-defaults---shared-library-preloading.md	Title: Client Connection Defaults / Shared Library Preloading server parameters description: Client Connection Defaults / Shared Library Preloading server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Client Connection Defaults Statement Behavior	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-client-connection-defaults---statement-behavior.md	Title: Client Connection Defaults / Statement Behavior server parameters description: Client Connection Defaults / Statement Behavior server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Connections And Authentication Authentication	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-connections-and-authentication---authentication.md	Title: Connections and Authentication / Authentication server parameters description: Connections and Authentication / Authentication server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Connections And Authentication Connection Settings	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-connections-and-authentication---connection-settings.md	Title: Connections and Authentication / Connection Settings server parameters description: Connections and Authentication / Connection Settings server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Connections And Authentication Ssl	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-connections-and-authentication---ssl.md	Title: Connections and Authentication / SSL server parameters description: Connections and Authentication / SSL server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Customized Options	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-customized-options.md	Title: Customized Options server parameters description: Customized Options server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Developer Options	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-developer-options.md	Title: Developer Options server parameters description: Developer Options server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Error Handling	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-error-handling.md	Title: Error Handling server parameters description: Error Handling server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table File Locations	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-file-locations.md	Title: File Locations server parameters description: File Locations server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Intelligent Tuning	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-intelligent-tuning.md	Title: Intelligent Tuning server parameters description: Intelligent Tuning server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Lock Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-lock-management.md	Title: Lock Management server parameters description: Lock Management server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Log Files	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-log-files.md	Title: Log Files server parameters description: Log Files server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Metrics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-metrics.md	Title: Metrics server parameters description: Metrics server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Pgbouncer	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-pgbouncer.md	Title: PgBouncer server parameters description: PgBouncer server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Preset Options	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-preset-options.md	Title: Preset Options server parameters description: Preset Options server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Query Tuning Genetic Query Optimizer	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-query-tuning---genetic-query-optimizer.md	Title: Query Tuning / Genetic Query Optimizer server parameters description: Query Tuning / Genetic Query Optimizer server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Query Tuning Other Planner Options	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-query-tuning---other-planner-options.md	Title: Query Tuning / Other Planner Options server parameters description: Query Tuning / Other Planner Options server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Query Tuning Planner Cost Constants	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-query-tuning---planner-cost-constants.md	Title: Query Tuning / Planner Cost Constants server parameters description: Query Tuning / Planner Cost Constants server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Query Tuning Planner Method Configuration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-query-tuning---planner-method-configuration.md	Title: Query Tuning / Planner Method Configuration server parameters description: Query Tuning / Planner Method Configuration server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Replication Primary Server	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-replication---primary-server.md	Title: Replication / Primary Server server parameters description: Replication / Primary Server server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Replication Sending Servers	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-replication---sending-servers.md	Title: Replication / Sending Servers server parameters description: Replication / Sending Servers server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Replication Standby Servers	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-replication---standby-servers.md	Title: Replication / Standby Servers server parameters description: Replication / Standby Servers server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Replication Subscribers	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-replication---subscribers.md	Title: Replication / Subscribers server parameters description: Replication / Subscribers server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Reporting And Logging Process Title	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-reporting-and-logging---process-title.md	Title: Reporting and Logging / Process Title server parameters description: Reporting and Logging / Process Title server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Reporting And Logging What To Log	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-reporting-and-logging---what-to-log.md	Title: Reporting and Logging / What to Log server parameters description: Reporting and Logging / What to Log server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Reporting And Logging When To Log	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-reporting-and-logging---when-to-log.md	Title: Reporting and Logging / When to Log server parameters description: Reporting and Logging / When to Log server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Reporting And Logging Where To Log	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-reporting-and-logging---where-to-log.md	Title: Reporting and Logging / Where to Log server parameters description: Reporting and Logging / Where to Log server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Resource Usage Asynchronous Behavior	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-resource-usage---asynchronous-behavior.md	Title: Resource Usage / Asynchronous Behavior server parameters description: Resource Usage / Asynchronous Behavior server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Resource Usage Background Writer	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-resource-usage---background-writer.md	Title: Resource Usage / Background Writer server parameters description: Resource Usage / Background Writer server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Resource Usage Cost Based Vacuum Delay	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-resource-usage---cost-based-vacuum-delay.md	Title: Resource Usage / Cost-Based Vacuum Delay server parameters description: Resource Usage / Cost-Based Vacuum Delay server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Resource Usage Disk	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-resource-usage---disk.md	Title: Resource Usage / Disk server parameters description: Resource Usage / Disk server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Resource Usage Kernel Resources	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-resource-usage---kernel-resources.md	Title: Resource Usage / Kernel Resources server parameters description: Resource Usage / Kernel Resources server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Resource Usage Memory	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-resource-usage---memory.md	Title: Resource Usage / Memory server parameters description: Resource Usage / Memory server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Statistics Cumulative Query And Index Statistics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-statistics---cumulative-query-and-index-statistics.md	Title: Statistics / Cumulative Query and Index Statistics server parameters description: Statistics / Cumulative Query and Index Statistics server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Statistics Monitoring	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-statistics---monitoring.md	Title: Statistics / Monitoring server parameters description: Statistics / Monitoring server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Statistics Query And Index Statistics Collector	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-statistics---query-and-index-statistics-collector.md	Title: Statistics / Query and Index Statistics Collector server parameters description: Statistics / Query and Index Statistics Collector server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Tls	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-tls.md	Title: TLS server parameters description: TLS server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Version And Platform Compatibility Other Platforms And Clients	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-version-and-platform-compatibility---other-platforms-and-clients.md	Title: Version and Platform Compatibility / Other Platforms and Clients server parameters description: Version and Platform Compatibility / Other Platforms and Clients server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Version And Platform Compatibility Previous Postgresql Versions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-version-and-platform-compatibility---previous-postgresql-versions.md	Title: Version and Platform Compatibility / Previous PostgreSQL Versions server parameters description: Version and Platform Compatibility / Previous PostgreSQL Versions server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Write Ahead Log Archive Recovery	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-write-ahead-log---archive-recovery.md	Title: Write-Ahead Log / Archive Recovery server parameters description: Write-Ahead Log / Archive Recovery server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Write Ahead Log Archiving	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-write-ahead-log---archiving.md	Title: Write-Ahead Log / Archiving server parameters description: Write-Ahead Log / Archiving server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Write Ahead Log Checkpoints	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-write-ahead-log---checkpoints.md	Title: Write-Ahead Log / Checkpoints server parameters description: Write-Ahead Log / Checkpoints server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Write Ahead Log Recovery Target	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-write-ahead-log---recovery-target.md	Title: Write-Ahead Log / Recovery Target server parameters description: Write-Ahead Log / Recovery Target server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Write Ahead Log Recovery	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-write-ahead-log---recovery.md	Title: Write-Ahead Log / Recovery server parameters description: Write-Ahead Log / Recovery server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Server Parameters Table Write Ahead Log Settings	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/server-parameters-table-write-ahead-log---settings.md	Title: Write-Ahead Log / Settings server parameters description: Write-Ahead Log / Settings server parameters for Azure Database for PostgreSQL - Flexible Server.--++ Last updated 06/18/2024
postgresql	Troubleshoot Canceling Statement Due To Conflict With Recovery	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/troubleshoot-canceling-statement-due-to-conflict-with-recovery.md	Title: Canceling statement due to conflict with recovery description: Provides resolutions for a read replica error - Canceling statement due to conflict with recovery.--++ Last updated 04/27/2024
postgresql	Troubleshoot Password Authentication Failed For User	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/postgresql/flexible-server/troubleshoot-password-authentication-failed-for-user.md	Title: Password authentication failed for user - Azure Database for PostgreSQL - Flexible Server description: Provides resolutions for a connection error - password authentication failed for user `<user-name>`.--++ Last updated 04/27/2024
reliability	Cross Region Replication Azure No Pair	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/reliability/cross-region-replication-azure-no-pair.md	For geo-replication in non-paired regions, use the [concierge pattern](/azure/io ## Azure Key Vault
reliability	Migrate Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/reliability/migrate-vm.md	Title: Migrate Azure Virtual Machines and Azure Virtual Machine Scale Sets to availability zone support description: Learn how to migrate your Azure Virtual Machines and Virtual Machine Scale Sets to availability zone support. -+ Last updated 09/21/2023
reliability	Reliability Event Grid	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/reliability/reliability-event-grid.md	-+ Last updated 07/02/2024
reliability	Reliability Virtual Machines	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/reliability/reliability-virtual-machines.md	-+ Last updated 07/18/2023
role-based-access-control	Analytics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/role-based-access-control/permissions/analytics.md	Azure service: [Azure Databricks](/azure/databricks/) > \| Microsoft.Databricks/locations/getNetworkPolicies/action \| Get Network Intent Polices for a subnet based on the location used by NRP \| > \| Microsoft.Databricks/locations/operationstatuses/read \| Reads the operation status for the resource. \| > \| Microsoft.Databricks/operations/read \| Gets the list of operations. \| +> \| Microsoft.Databricks/workspaces/assignWorkspaceAdmin/action \| Makes the user a workspace Admin within Databricks. \| > \| Microsoft.Databricks/workspaces/read \| Retrieves a list of Databricks workspaces. \| > \| Microsoft.Databricks/workspaces/write \| Creates a Databricks workspace. \| > \| Microsoft.Databricks/workspaces/delete \| Removes a Databricks workspace. \|
sentinel	Deprecated Kaspersky Security Center Via Legacy Agent	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/data-connectors/deprecated-kaspersky-security-center-via-legacy-agent.md	KasperskySCEvent > [!NOTE] - > This data connector depends on a parser based on a Kusto Function to work as expected [KasperskySCEvent](https://aka.ms/sentinel-kasperskysc-parser) which is deployed with the Microsoft Sentinel Solution. + > This data connector depends on a parser based on a Kusto Function to work as expected KasperskySCEvent which is deployed with the Microsoft Sentinel Solution. > [!NOTE] Make sure to configure the machine's security according to your organization's s [Learn more >](https://aka.ms/SecureCEF)--- -## Next steps - -For more information, go to the [related solution](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/azuresentinel.azure-sentinel-solution-kasperskysc?tab=Overview) in the Azure Marketplace.
sentinel	Recommended Kaspersky Security Center Via Ama	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/data-connectors/recommended-kaspersky-security-center-via-ama.md	To integrate with [Recommended] Kaspersky Security Center via AMA make sure you > [!NOTE] - > This data connector depends on a parser based on a Kusto Function to work as expected [KasperskySCEvent](https://aka.ms/sentinel-kasperskysc-parser) which is deployed with the Microsoft Sentinel Solution. + > This data connector depends on a parser based on a Kusto Function to work as expected KasperskySCEvent which is deployed with the Microsoft Sentinel Solution. 2. Secure your machine Make sure to configure the machine's security according to your organization's s [Learn more >](https://aka.ms/SecureCEF)--- -## Next steps - -For more information, go to the [related solution](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/azuresentinel.azure-sentinel-solution-kasperskysc?tab=Overview) in the Azure Marketplace.
sentinel	Sap Btp Solution Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/sap/sap-btp-solution-overview.md	description: This article introduces the Microsoft Sentinel Solution for SAP BTP Previously updated : 07/17/2024 Last updated : 08/08/2024 # Microsoft Sentinel Solution for SAP BTP overview -This article introduces the Microsoft Sentinel Solution for SAP BTP. The solution monitors and protects your SAP Business Technology Platform (BTP) system: It collects audits and activity logs from the BTP infrastructure and BTP based apps, and detects threats, suspicious activities, illegitimate activities, and more. - SAP BTP is a cloud-based solution that provides a wide range of tools and services for developers to build, run, and manage applications. One of the key features of SAP BTP is its low-code development capabilities. Low-code development allows developers to create applications quickly and efficiently by using visual drag-and-drop interfaces and prebuilt components, rather than writing code from scratch. -### Why it's important to monitor BTP activity +The Microsoft Sentinel solution for SAP BTP monitors and protects your SAP Business Technology Platform (BTP) system by collecting audits and activity logs from the BTP infrastructure and BTP based apps, and detecting threats, suspicious activities, illegitimate activities, and more. + +## Solution architecture + +The following image illustrates how Microsoft Sentinel retrieves the complete BTP's audit log information. The Microsoft Sentinel solution for SAP BTP provides built-in analytics rules and detections for selected scenarios, which you can extend to cover more of the audit log information and events. +++ +## Why it's important to monitor BTP activity While low-code development platforms have become increasingly popular among businesses looking to accelerate their application development processes, there are also security risks that organizations must consider. One key concern is the risk of security vulnerabilities introduced by citizen developers, some of whom might lack the security awareness of traditional pro-dev community. To counter these vulnerabilities, it's crucial for organizations to quickly detect and respond to threats on BTP applications. -Beyond the low-code aspect, BTP applications: +Beyond the low-code aspect, BTP applications have the following aspects that make them a target for cyber threats: -- Access sensitive business data, such as customers, opportunities, orders, financial data, and manufacturing processes.-- Access and integrate with multiple different business applications and data storesΓÇï.-- Enable key business processesΓÇï.-- Are created by citizen developers who might not be security savvy or aware of cyber threats.-- Used by wide range of users, internal and externalΓÇï. +- Access sensitive business data, such as customers, opportunities, orders, financial data, and manufacturing processes. +- Access and integrate with multiple different business applications and data storesΓÇï. +- Enable key business processesΓÇï. +- Are created by citizen developers who might not be security savvy or aware of cyber threats. +- Used by wide range of users, internal and externalΓÇï. -Therefore, it's important to protect your BTP system against these risks. +For more information, see [Nice patch SAP! Revisiting your SAP BTP security measures after AI Core vulnerability fix](https://community.sap.com/t5/technology-blogs-by-members/nice-patch-sap-revisiting-your-sap-btp-security-measures-after-ai-core/ba-p/13770662) (blog). ## How the solution addresses BTP security risks The solution includes: ## Next steps -In this article, you learned about the Microsoft Sentinel solution for SAP BTP. - > [!div class="nextstepaction"] > [Deploy the Microsoft Sentinel Solution for SAP BTP](deploy-sap-btp-solution.md)
site-recovery	Site Recovery Deployment Planner History	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/site-recovery/site-recovery-deployment-planner-history.md	This article provides history of all versions of Azure Site Recovery Deployment Planner along with the fixes, known limitations in each and their release dates. -## Version 2.53 +## Version 3.0 Release Date: June 4, 2024 This article provides history of all versions of Azure Site Recovery Deployment - 32TB data disk support -## Version 3.0 +## Version 2.52 Release Date: June 4, 2020
storage	Monitor Blob Storage Reference	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/blobs/monitor-blob-storage-reference.md	Title: Monitoring data reference for Azure Blob Storage description: This article contains important reference material you need when you monitor Azure Blob Storage. Previously updated : 02/12/2024 Last updated : 08/08/2024 The following sections describe the properties for Azure Storage resource logs w ```json { "time": "2019-02-28T19:10:21.2123117Z", - "resourceId": "/subscriptions/12345678-2222-3333-4444-555555555555/resourceGroups/mytestrp/providers/Microsoft.Storage/storageAccounts/testaccount1/blobServices/default", + "resourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/mytestrp/providers/Microsoft.Storage/storageAccounts/testaccount1/blobServices/default", "category": "StorageWrite", "operationName": "PutBlob", "operationVersion": "2017-04-17", The following sections describe the properties for Azure Storage resource logs w "statusText": "Success", "durationMs": 5, "callerIpAddress": "192.168.0.1:11111", - "correlationId": "ad881411-201e-004e-1c99-cfd67d000000", + "correlationId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "location": "uswestcentral", "uri": "http://mystorageaccount.blob.core.windows.net/cont1/blobname?timeout=10" } The following sections describe the properties for Azure Storage resource logs w "authorization": [ { "action": "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", - "denyAssignmentId": "821ddce4-021d-4d04-8a41-gggggggggggg", + "denyAssignmentId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "principals": [ { - "id": "fde5ba15-4355-4223-b811-cccccccccccc", + "id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "type": "User" } ], "reason": "Policy", "result": "Granted", - "roleAssignmentId": "ecf75cb8-491c-4a25-ad6e-aaaaaaaaaaaa", - "roleDefinitionId": "b7e6dc6d-f1e8-4753-8033-ffffffffffff", + "roleAssignmentId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", + "roleDefinitionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "type": "RBAC" } ], The following sections describe the properties for Azure Storage resource logs w "objectKey": "/samplestorageaccount/samplecontainer/sampleblob.png" }, "requester": { - "appId": "691458b9-1327-4635-9f55-bbbbbbbbbbbb", + "appId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "audience": "https://storage.azure.com/", - "objectId": "fde5ba15-4355-4223-b811-cccccccccccc", - "tenantId": "72f988bf-86f1-41af-91ab-dddddddddddd", - "tokenIssuer": "https://sts.windows.net/72f988bf-86f1-41af-91ab-eeeeeeeeeeee/" + "objectId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", + "tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", + "tokenIssuer": "https://sts.windows.net/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }, "type": "OAuth" }, The following sections describe the properties for Azure Storage resource logs w "accountName": "testaccount1", "requestUrl": "https://testaccount1.blob.core.windows.net:443/upload?restype=container&comp=list&prefix=&delimiter=/&marker=&maxresults=30&include=metadata&_=1551405598426", "userAgentHeader": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134", - "referrerHeader": "blob:https://portal.azure.com/6f50025f-3b88-488d-b29e-3c592a31ddc9", + "referrerHeader": "blob:https://portal.azure.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "clientRequestId": "", "etag": "", "serverLatencyMs": 63,
storage	File Sync Extend Servers	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/file-sync/file-sync-extend-servers.md	Title: Tutorial - Extend Windows file servers with Azure File Sync + Title: Extend Windows file servers with Azure File Sync description: Learn how to extend Windows file servers with Azure File Sync, from start to finish. Previously updated : 06/21/2022 Last updated : 08/08/2024 #Customer intent: As an IT administrator, I want see how to extend Windows file servers with Azure File Sync, so I can evaluate the process for extending the storage capacity of my Windows servers. Sign in to the [Azure portal](https://portal.azure.com). For this tutorial, you need to do the following before you can deploy Azure File Sync: - Create an Azure storage account and file share-- Set up a Windows Server 2019 Datacenter VM +- Set up a Windows Server VM - Prepare the Windows Server VM for Azure File Sync ### Create a folder and .txt file Now you can add the data disk to the VM. ### Install the Azure PowerShell module -Next, in the Windows Server 2019 Datacenter VM, install the Azure PowerShell module on the server. The `Az` module is a rollup module for the Azure PowerShell cmdlets. Installing it downloads all the available Azure Resource Manager modules and makes their cmdlets available for use. +Next, in the Windows Server VM, install the Azure PowerShell module on the server. The `Az` module is a rollup module for the Azure PowerShell cmdlets. Installing it downloads all the available Azure Resource Manager modules and makes their cmdlets available for use. 1. In the VM, open an elevated PowerShell window (run as administrator). 1. Run the following command: If you'd like to clean up the resources you created in this tutorial, first remo [!INCLUDE [storage-files-clean-up-portal](../../../includes/storage-files-clean-up-portal.md)] -## Next steps +## Next step In this tutorial, you learned the basic steps to extend the storage capacity of a Windows server by using Azure File Sync. For a more thorough look at planning for an Azure File Sync deployment, see:
storage	File Sync Release Notes	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/file-sync/file-sync-release-notes.md	Previously updated : 07/30/2024 Last updated : 08/08/2024 The following Azure File Sync agent versions are supported: \| V18 Release - [KB5023057](https://support.microsoft.com/topic/feb374ad-6256-4eeb-9371-eb85071f756f)\| 18.0.0.0 \| May 8, 2024 \| Supported \| \| V17.2 Release - [KB5023055](https://support.microsoft.com/topic/dfa4c285-a4cb-4561-b0ed-bbd4ae09d91d)\| 17.2.0.0 \| February 28, 2024 \| Supported \| \| V17.1 Release - [KB5023054](https://support.microsoft.com/topic/azure-file-sync-agent-v17-1-release-february-2024-security-only-update-bd1ce41c-27f4-4e3d-a80f-92f74817c55b)\| 17.1.0.0 \| February 13, 2024 \| Supported - Security Update \| +\| V16.2 Release - [KB5023052](https://support.microsoft.com/topic/azure-file-sync-agent-v16-2-release-february-2024-security-only-update-8247bf99-8f51-4eb6-b378-b86b6d1d45b8)\| 16.2.0.0 \| February 13, 2024 \| Supported - Security Update - Agent version will expire on October 7, 2024\| \| V17.0 Release - [KB5023053](https://support.microsoft.com/topic/azure-file-sync-agent-v17-release-december-2023-flighting-2d8cba16-c035-4c54-b35d-1bd8fd795ba9)\| 17.0.0.0 \| December 6, 2023 \| Supported \| +\| V16.0 Release - [KB5013877](https://support.microsoft.com/topic/ffdc8fe2-c653-43c8-8b47-0865267fd520)\| 16.0.0.0 \| January 30, 2023 \| Supported - Agent version will expire on October 7, 2024 \| ## Unsupported versions The following Azure File Sync agent versions have expired and are no longer supp \| Milestone \| Agent version number \| Release date \| Status \| \|-\|-\|--\|\| -\| V16 Release \| 16.0.0.0 - 16.2.0.0 \| N/A \| Not Supported - Agent versions expired on July 29, 2024 \| \| V15 Release \| 15.0.0.0 - 15.2.0.0 \| N/A \| Not Supported - Agent versions expired on March 19, 2024 \| \| V14 Release \| 14.0.0.0 \| N/A \| Not Supported - Agent versions expired on February 8, 2024 \| \| V13 Release \| 13.0.0.0 \| N/A \| Not Supported - Agent versions expired on August 8, 2022 \| The following release notes are for Azure File Sync version 17.1.0.0 (released F - Fixes an issue that might allow unauthorized users to create new files in locations they aren't allowed to. This is a security-only update. For more information about this vulnerability, see [CVE-2024-21397](https://msrc.microsoft.com/update-guide/en-US/advisory/CVE-2024-21397). +## Version 16.2.0.0 (Security Update) + +The following release notes are for Azure File Sync version 16.2.0.0 (released February 13, 2024). This release contains security updates for the Azure File Sync agent. These notes are in addition to the release notes listed for version 16.0.0.0. + +### Improvements and issues that are fixed + +- Fixes an issue that might allow unauthorized users to create new files in locations they aren't allowed to. This is a security-only update. For more information about this vulnerability, see [CVE-2024-21397](https://msrc.microsoft.com/update-guide/en-US/advisory/CVE-2024-21397). + ## Version 17.0.0.0 The following release notes are for Azure File Sync version 17.0.0.0 (released December 6, 2023). This release contains improvements for the Azure File Sync service and agent. The following items don't sync, but the rest of the system continues to operate - If a tiered file is copied to another location by using Robocopy, the resulting file isn't tiered. The offline attribute might be set because Robocopy incorrectly includes that attribute in copy operations. - When copying files using Robocopy, use the /MIR option to preserve file timestamps. This will ensure older files are tiered sooner than recently accessed files.+ +## Version 16.0.0.0 + +The following release notes are for Azure File Sync version 16.0.0.0 (released January 30, 2023). This release contains improvements for the Azure File Sync service and agent. + +### Improvements and issues that are fixed + +- Improved Azure File Sync service availability + - Azure File Sync is now a zone-redundant service, which means an outage in a zone has limited impact while improving the service resiliency to minimize customer impact. To fully use this improvement, configure your storage accounts to use zone-redundant storage (ZRS) or Geo-zone redundant storage (GZRS) replication. To learn more about different redundancy options for your storage accounts, see [Azure Files redundancy](../files/files-redundancy.md). +- Immediately run server change enumeration to detect files changes that were missed on the server + - Azure File Sync uses the [Windows USN journal](/windows/win32/fileio/change-journals) feature on Windows Server to immediately detect files that were changed and upload them to the Azure file share. If files changed are missed due to journal wrap or other issues, the files won't sync to the Azure file share until the changes are detected. Azure File Sync has a server change enumeration job that runs every 24 hours on the server endpoint path to detect changes that were missed by the USN journal. If you don't want to wait until the next server change enumeration job runs, you can now use the `Invoke-StorageSyncServerChangeDetection` PowerShell cmdlet to immediately run server change enumeration on a server endpoint path. + + To immediately run server change enumeration on a server endpoint path, run the following PowerShell commands: + + ```powershell + Import-Module "C:\Program Files\Azure\StorageSyncAgent\StorageSync.Management.ServerCmdlets.dll" + Invoke-StorageSyncServerChangeDetection -ServerEndpointPath <path> + ``` + + > [!NOTE] + > By default, the server change enumeration scan will only check the modified timestamp. To perform a deeper check, use the -DeepScan parameter. + +- Bug fix for the PowerShell script FileSyncErrorsReport.ps1 + +- Miscellaneous reliability and telemetry improvements for cloud tiering and sync + +### Evaluation Tool + +Before deploying Azure File Sync, you should evaluate whether it's compatible with your system using the Azure File Sync evaluation tool. This tool is an Azure PowerShell cmdlet that checks for potential issues with your file system and dataset, such as unsupported characters or an unsupported OS version. For installation and usage instructions, see [Evaluation Tool](file-sync-planning.md#evaluation-cmdlet) section in the planning guide. + +### Agent installation and server configuration + +For more information on how to install and configure the Azure File Sync agent with Windows Server, see [Planning for an Azure File Sync deployment](file-sync-planning.md) and [How to deploy Azure File Sync](file-sync-deployment-guide.md). + +- The agent installation package must be installed with elevated (admin) permissions. +- The agent isn't supported on Nano Server deployment option. +- The agent is supported only on Windows Server 2019, Windows Server 2016, Windows Server 2012 R2, and Windows Server 2022. +- The agent installation package is for a specific operating system version. If a server with an Azure File Sync agent installed is upgraded to a newer operating system version, you must uninstall the existing agent, restart the server, and install the agent for the new server operating system (Windows Server 2016, Windows Server 2019, or Windows Server 2022). +- The agent requires at least 2 GiB of memory. If the server is running in a virtual machine with dynamic memory enabled, the VM should be configured with a minimum 2048 MiB of memory. See [Recommended system resources](file-sync-planning.md#recommended-system-resources) for more information. +- The Storage Sync Agent (FileSyncSvc) service doesn't support server endpoints located on a volume that has the system volume information (SVI) directory compressed. This configuration will lead to unexpected results. + +### Interoperability + +- Antivirus, backup, and other applications that access tiered files can cause undesirable recall unless they respect the offline attribute and skip reading the content of those files. For more information, see [Troubleshoot Azure File Sync](/troubleshoot/azure/azure-storage/file-sync-troubleshoot?toc=/azure/storage/file-sync/toc.json). +- File Server Resource Manager (FSRM) file screens can cause endless sync failures when files are blocked because of the file screen. +- Running sysprep on a server that has the Azure File Sync agent installed isn't supported and can lead to unexpected results. The Azure File Sync agent should be installed after deploying the server image and completing sysprep mini-setup. + +### Sync limitations + +The following items don't sync, but the rest of the system continues to operate normally: + +- Files with unsupported characters. See [Troubleshooting guide](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#handling-unsupported-characters) for a list of unsupported characters. +- Files or directories that end with a period. +- Paths that are longer than 2,048 characters. +- The system access control list (SACL) portion of a security descriptor that's used for auditing. +- Extended attributes. +- Alternate data streams. +- Reparse points. +- Hard links. +- Compression (if it's set on a server file) isn't preserved when changes sync to that file from other endpoints. +- Any file that's encrypted with EFS (or other user mode encryption) that prevents the service from reading the data. + +> [!NOTE] +> Azure File Sync always encrypts data in transit. Data is always encrypted at rest in Azure. +### Server endpoint + +- A server endpoint can be created only on an NTFS volume. ReFS, FAT, FAT32, and other file systems aren't currently supported by Azure File Sync. +- Cloud tiering isn't supported on the system volume. To create a server endpoint on the system volume, disable cloud tiering when creating the server endpoint. +- Failover Clustering is supported only with clustered disks, but not with Cluster Shared Volumes (CSVs). +- A server endpoint can't be nested. It can coexist on the same volume in parallel with another endpoint. +- Don't store an OS or application paging file within a server endpoint location. + +### Cloud endpoint + +- Azure File Sync supports making changes to the Azure file share directly. However, any changes made on the Azure file share first need to be discovered by an Azure File Sync change detection job. A change detection job is initiated for a cloud endpoint once every 24 hours. To immediately sync files that are changed in the Azure file share, the [Invoke-AzStorageSyncChangeDetection](/powershell/module/az.storagesync/invoke-azstoragesyncchangedetection) PowerShell cmdlet can be used to manually initiate the detection of changes in the Azure file share. +- The storage sync service and/or storage account can be moved to a different resource group, subscription, or Azure AD tenant. After the storage sync service or storage account is moved, you need to give the Microsoft.StorageSync application access to the storage account (see [Ensure Azure File Sync has access to the storage account](/troubleshoot/azure/azure-storage/file-sync-troubleshoot-sync-errors?toc=/azure/storage/file-sync/toc.json#troubleshoot-rbac)). + +> [!NOTE] +> When creating the cloud endpoint, the storage sync service and storage account must be in the same Azure AD tenant. Once the cloud endpoint is created, the storage sync service and storage account can be moved to different Azure AD tenants. +### Cloud tiering + +- If a tiered file is copied to another location by using Robocopy, the resulting file isn't tiered. The offline attribute might be set because Robocopy incorrectly includes that attribute in copy operations. +- When copying files using Robocopy, use the /MIR option to preserve file timestamps. This will ensure older files are tiered sooner than recently accessed files.
storage	File Sync Resource Move	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/file-sync/file-sync-resource-move.md	description: Learn how to move sync resources across resource groups, subscripti Previously updated : 09/21/2023 Last updated : 08/08/2024 In Azure File Sync, the only resource capable of moving is the Storage Sync Serv * :::image type="icon" source="media/storage-sync-resource-move/storage-sync-resource-move-storage-account.png" border="false"::: Storage account * :::image type="icon" source="media/storage-sync-resource-move/storage-sync-resource-move-file-share.png" border="false"::: File share -In Azure Storage, the only resource capable of moving is the storage account. An Azure file share, as a subresource, can't move to a different storage account. +The only resource capable of moving is the storage account. An Azure file share, as a subresource, can't move to a different storage account. ## Supported combinations As a best practice, the Storage Sync Service and the storage accounts that have * Storage Sync Service and storage accounts are located in different subscriptions (same Azure tenant) > [!IMPORTANT] -> Through different combinations of moves, a Storage Sync Service and storage accounts can end up in different subscriptions, governed by different Microsoft Entra tenants. Sync would even appear to be working, but this is not a supported configuration. Sync can stop in the future with no ability to get back into a working condition. +> Through different combinations of moves, a Storage Sync Service and storage accounts can end up in different subscriptions, governed by different Microsoft Entra tenants. Sync would even appear to be working, but this isn't a supported configuration. Sync can stop in the future with no ability to get back into a working condition. When planning your resource move, there are different considerations for [moving within the same Microsoft Entra tenant](#move-within-the-same-azure-active-directory-tenant) and moving across [to a different Microsoft Entra tenant](#move-to-a-new-azure-active-directory-tenant). When moving Microsoft Entra tenants, always move sync and storage resources together.
storage	Files Redundancy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/files/files-redundancy.md	description: Understand the data redundancy options available in Azure file shar Previously updated : 05/15/2024 Last updated : 08/08/2024 With ZRS, your data is still accessible for both read and write operations even A write request to a storage account that is using ZRS happens synchronously. The write operation returns successfully only after the data is written to all replicas across the three availability zones. -An advantage of using ZRS for Azure Files workloads is that if a zone becomes unavailable, no remounting of Azure file shares from the connected clients is required. We recommend using ZRS in the primary region for scenarios that require high availability and low RPO/RTO. We also recommend ZRS for restricting replication of data to a particular country or region to meet data governance requirements. +An advantage of using ZRS for Azure Files workloads is that if a zone becomes unavailable, no remounting of Azure file shares from the connected clients is required. We recommend using ZRS in the primary region for scenarios that require high availability. We also recommend ZRS for restricting replication of data to a particular country or region to meet data governance requirements. > [!NOTE] > Azure File Sync is zone-redundant in all regions that [support zones](../../reliability/availability-zones-service-support.md#azure-regions-with-availability-zone-support) except US Gov Virginia. In most cases, we recommend that Azure File Sync users configure storage accounts to use ZRS or GZRS. For pricing information for each redundancy option, see [Azure Files pricing](ht ## See also - [Change the redundancy option for a storage account](../common/redundancy-migration.md?toc=/azure/storage/files/toc.json)-- [Use geo-redundancy to design highly available applications](../common/geo-redundant-design.md?toc=/azure/storage/files/toc.json) +- [Use geo-redundancy to design highly available applications](../common/geo-redundant-design.md?toc=/azure/storage/files/toc.json)
storage	Storage Files Quick Create Use Linux	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/files/storage-files-quick-create-use-linux.md	Create an SSH connection with the VM. 1. If you are on a Mac or Linux machine, open a Bash prompt. If you are on a Windows machine, open a PowerShell prompt. -1. At your prompt, open an SSH connection to your VM. Replace the IP address with the one from your VM, and replace the path to the `.pem` with the path to where the key file was downloaded. +1. At your prompt, open an SSH connection to your VM. Replace `xx.xx.xx.xx` with the IP address of your VM, and replace the path to the `.pem` with the path to where the key file was downloaded. ```console -ssh -i .\Downloads\myVM_key.pem azureuser@20.25.14.85 +ssh -i .\Downloads\myVM_key.pem azureuser@xx.xx.xx.xx ``` If you encounter a warning that the authenticity of the host can't be established, type yes to continue connecting to the VM. Leave the ssh connection open for the next step. > [!TIP] -> The SSH key you created can be used the next time your create a VM in Azure. Just select the Use a key stored in Azure for SSH public key source the next time you create a VM. You already have the private key on your computer, so you won't need to download anything. +> You can use the SSH key you created the next time you create a VM in Azure. Just select the Use a key stored in Azure for SSH public key source the next time you create a VM. You already have the private key on your computer, so you won't need to download anything. ## Mount the NFS share
storage	Storage Files Quick Create Use Windows	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/files/storage-files-quick-create-use-windows.md	Now that you've created the VM, connect to it so you can mount your file share. :::image type="content" source="media/storage-files-quick-create-use-windows/local-host2.png" alt-text="Screenshot of the VM log in prompt, more choices is highlighted."::: - 1. You might receive a certificate warning during the sign-in process. Select Yes or Continue to create the connection. ### Map the Azure file share to a Windows drive Now that you've mapped the drive, create a snapshot. :::image type="content" source="media/storage-files-quick-create-use-windows/create-snapshot.png" alt-text="Screenshot of the storage account snapshots tab."::: - 1. In the VM, open the qstestfile.txt and type "this file has been modified". Save and close the file. 1. Create another snapshot.
synapse-analytics	Apache Spark Development Using Notebooks	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md	Title: How to use Synapse notebooks + Title: Create, develop, and maintain Synapse notebooks description: In this article, you learn how to create and develop Synapse notebooks to do data preparation and visualization. -+ Last updated 05/08/2021 -# Create, develop, and maintain Synapse notebooks in Azure Synapse Analytics +# Create, develop, and maintain Synapse notebooks -A Synapse notebook is a web interface for you to create files that contain live code, visualizations, and narrative text. Notebooks are a good place to validate ideas and use quick experiments to get insights from your data. Notebooks are also widely used in data preparation, data visualization, machine learning, and other Big Data scenarios. +A notebook in Azure Synapse Analytics (a Synapse notebook) is a web interface for you to create files that contain live code, visualizations, and narrative text. Notebooks are a good place to validate ideas and use quick experiments to get insights from your data. Notebooks are also widely used in data preparation, data visualization, machine learning, and other big-data scenarios. -With a Synapse notebook, you can: +With a Synapse notebook, you can: * Get started with zero setup effort. -* Keep data secure with built-in enterprise security features. -* Analyze data across raw formats (CSV, txt, JSON, etc.), processed file formats (parquet, Delta Lake, ORC, etc.), and SQL tabular data files against Spark and SQL. +* Help keep data secure with built-in enterprise security features. +* Analyze data across raw formats (like CSV, TXT, and JSON), processed file formats (like Parquet, Delta Lake, and ORC), and SQL tabular data files against Spark and SQL. * Be productive with enhanced authoring capabilities and built-in data visualization. This article describes how to use notebooks in Synapse Studio. - ## Create a notebook -There are two ways to create a notebook. You can create a new notebook or import an existing notebook to a Synapse workspace from the Object Explorer. Synapse notebooks recognize standard Jupyter Notebook IPYNB files. +You can create a new notebook or import an existing notebook to a Synapse workspace from Object Explorer. Select Develop, right-click Notebooks, and then select New notebook or Import. Synapse notebooks recognize standard Jupyter Notebook IPYNB files. -![Screenshot of create new or import notebook](./media/apache-spark-development-using-notebooks/synapse-create-import-notebook-2.png) +![Screenshot of selections for creating or importing a notebook.](./media/apache-spark-development-using-notebooks/synapse-create-import-notebook-2.png) ## Develop notebooks -Notebooks consist of cells, which are individual blocks of code or text that can be run independently or as a group. - -We provide rich operations to develop notebooks: -+ [Add a cell](#add-a-cell) -+ [Set a primary language](#set-a-primary-language) -+ [Use multiple languages](#use-multiple-languages) -+ [Use temp tables to reference data across languages](#use-temp-tables-to-reference-data-across-languages) -+ [IDE-style IntelliSense](#ide-style-intellisense) -+ [Code Snippets](#code-snippets) -+ [Format text cell with toolbar buttons](#format-text-cell-with-toolbar-buttons) -+ [Undo/Redo cell operation](#undo-redo-cell-operation) -+ [Code cell commenting](#Code-cell-commenting) -+ [Move a cell](#move-a-cell) -+ [Delete a cell](#delete-a-cell) -+ [Collapse a cell input](#collapse-a-cell-input) -+ [Collapse a cell output](#collapse-a-cell-output) -+ [Notebook outline](#notebook-outline) +Notebooks consist of cells, which are individual blocks of code or text that you can run independently or as a group. + +The following sections describe the operations for developing notebooks: + +* [Add a cell](#add-a-cell) +* [Set a primary language](#set-a-primary-language) +* [Use multiple languages](#use-multiple-languages) +* [Use temporary tables to reference data across languages](#use-temp-tables-to-reference-data-across-languages) +* [Use IDE-style IntelliSense](#ide-style-intellisense) +* [Use code snippets](#code-snippets) +* [Format text cells by using toolbar buttons](#format-text-cell-with-toolbar-buttons) +* [Undo or redo a cell operation](#undo-redo-cell-operation) +* [Comment on a code cell](#code-cell-commenting) +* [Move a cell](#move-a-cell) +* [Delete a cell](#delete-a-cell) +* [Collapse cell input](#collapse-a-cell-input) +* [Collapse cell output](#collapse-a-cell-output) +* [Use a notebook outline](#notebook-outline) > [!NOTE] -> -> In the notebooks, there is a SparkSession automatically created for you, stored in a variable called `spark`. Also there is a variable for SparkContext which is called `sc`. Users can access these variables directly and should not change the values of these variables. - +> In the notebooks, a `SparkSession` instance is automatically created for you and stored in a variable called `spark`. There's also a variable for `SparkContext` called `sc`. Users can access these variables directly but shouldn't change the values of these variables. -<h3 id="add-a-cell">Add a cell</h3> +### <a name = "add-a-cell"></a>Add a cell -There are multiple ways to add a new cell to your notebook. +There are multiple ways to add a new cell to your notebook: -1. Hover over the space between two cells and select Code or Markdown. - ![Screenshot of add-azure-notebook-cell-with-cell-button](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-add-cell-1.png) +* Hover over the space between two cells and select Code or Markdown. -2. Use [aznb Shortcut keys under command mode](#shortcut-keys-under-command-mode). Press A to insert a cell above the current cell. Press B to insert a cell below the current cell. + ![Screenshot of Code and Markdown buttons in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-add-cell-1.png) - +* Use [shortcut keys in command mode](#shortcut-keys-in-command-mode). Select the A key to insert a cell above the current cell. Select the B key to insert a cell below the current cell. -<h3 id="set-a-primary-language">Set a primary language</h3> +### <a name = "set-a-primary-language"></a>Set a primary language Synapse notebooks support four Apache Spark languages: Synapse notebooks support four Apache Spark languages: * .NET Spark (C#) * SparkR (R) -You can set the primary language for new added cells from the dropdown list in the top command bar. +You can set the primary language for newly added cells from the Language dropdown list on the top command bar. - ![Screenshot of default-synapse-language](./media/apache-spark-development-using-notebooks/synapse-default-language.png) +![Screenshot of the dropdown list for selecting a language in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-default-language.png) +### <a name = "use-multiple-languages"></a>Use multiple languages -<h3 id="use-multiple-languages">Use multiple languages</h3> - -You can use multiple languages in one notebook by specifying the correct language magic command at the beginning of a cell. The following table lists the magic commands to switch cell languages. +You can use multiple languages in one notebook by specifying the correct language magic command at the beginning of a cell. The following table lists the magic commands to switch cell languages. \|Magic command \|Language \| Description \| \|\|\|--\| -\|%%pyspark\| Python \| Execute a Python query against Spark Context. \| -\|%%spark\| Scala \| Execute a Scala query against Spark Context. \| -\|%%sql\| SparkSQL \| Execute a SparkSQL query against Spark Context. \| -\|%%csharp \| .NET for Spark C# \| Execute a .NET for Spark C# query against Spark Context. \| -\|%%sparkr \| R \| Execute a R query against Spark Context. \| - -The following image is an example of how you can write a PySpark query using the %%pyspark magic command or a SparkSQL query with the %%sql magic command in a Spark(Scala) notebook. Notice that the primary language for the notebook is set to pySpark. +\|`%%pyspark`\| Python \| Run a Python query against `SparkContext`. \| +\|`%%spark`\| Scala \| Run a Scala query against `SparkContext`. \| +\|`%%sql`\| Spark SQL \| Run a Spark SQL query against `SparkContext`. \| +\|`%%csharp` \| .NET for Spark C# \| Run a .NET for Spark C# query against `SparkContext`. \| +\|`%%sparkr` \| R \| Run an R query against `SparkContext`. \| - ![Screenshot of Synapse spark magic commands](./media/apache-spark-development-using-notebooks/synapse-spark-magics.png) +The following image shows an example of how you can write a PySpark query by using the `%%pyspark` magic command or a Spark SQL query by using the `%%sql` magic command in a Spark (Scala) notebook. The primary language for the notebook is set to PySpark. +![Screenshot of Spark magic commands in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-spark-magics.png) -<h3 id="use-temp-tables-to-reference-data-across-languages">Use temp tables to reference data across languages</h3> +### <a name = "use-temp-tables-to-reference-data-across-languages"></a>Use temporary tables to reference data across languages -You cannot reference data or variables directly across different languages in a Synapse notebook. In Spark, a temporary table can be referenced across languages. Here is an example of how to read a `Scala` DataFrame in `PySpark` and `SparkSQL` using a Spark temp table as a workaround. +You can't reference data or variables directly across different languages in a Synapse notebook. In Spark, you can reference a temporary table across languages. Here's an example of how to read a Scala DataFrame in PySpark and Spark SQL by using a Spark temporary table as a workaround: -1. In Cell 1, read a DataFrame from a SQL pool connector using Scala and create a temporary table. +1. In cell 1, read a DataFrame from a SQL pool connector by using Scala and create a temporary table: ```scala %%spark You cannot reference data or variables directly across different languages in a scalaDataFrame.createOrReplaceTempView( "mydataframetable" ) ``` -2. In Cell 2, query the data using Spark SQL. - +2. In cell 2, query the data by using Spark SQL: + ```sql %%sql SELECT * FROM mydataframetable ``` -3. In Cell 3, use the data in PySpark. +3. In cell 3, use the data in PySpark: ```python %%pyspark myNewPythonDataFrame = spark.sql("SELECT * FROM mydataframetable") ``` -<h3 id="ide-style-intellisense">IDE-style IntelliSense</h3> +### <a name = "ide-style-intellisense"></a>Use IDE-style IntelliSense -Synapse notebooks are integrated with the Monaco editor to bring IDE-style IntelliSense to the cell editor. Syntax highlight, error marker, and automatic code completions help you to write code and identify issues quicker. +Synapse notebooks are integrated with the Monaco editor to bring IDE-style IntelliSense to the cell editor. The features of syntax highlight, error marker, and automatic code completion help you write code and identify issues faster. The IntelliSense features are at different levels of maturity for different languages. Use the following table to see what's supported. -\|Languages\| Syntax Highlight \| Syntax Error Marker \| Syntax Code Completion \| Variable Code Completion\| System Function Code Completion\| User Function Code Completion\| Smart Indent \| Code Folding\| +\|Languages\| Syntax highlight \| Syntax error marker \| Syntax code completion \| Variable code completion\| System function code completion\| User function code completion\| Smart indent \| Code folding\| \|--\|--\|--\|--\|--\|--\|--\|--\|--\| \|PySpark (Python)\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\| -\|Spark (Scala)\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\|-\|Yes\| -\|SparkSQL\|Yes\|Yes\|Yes\|Yes\|Yes\|-\|-\|-\| +\|Spark (Scala)\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\|No\|Yes\| +\|Spark SQL\|Yes\|Yes\|Yes\|Yes\|Yes\|No\|No\|No\| \|.NET for Spark (C#)\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\|Yes\| ->[!Note] -> An active Spark session is required to benefit the Variable Code Completion, System Function Code Completion∩╝îUser Function Code Completion for .NET for Spark (C#). - -<h3 id="code-snippets">Code Snippets</h3> +An active Spark session is required to benefit from variable code completion, system function code completion, and user function code completion for .NET for Spark (C#). -Synapse notebooks provide code snippets that make it easier to enter commonly used code patterns, such as configuring your Spark session, reading data as a Spark DataFrame, or drawing charts with matplotlib etc. +### <a name = "code-snippets"></a>Use code snippets -Snippets appear in [Shortcut keys of IDE style IntelliSense](#ide-style-intellisense) mixed with other suggestions. The code snippets contents align with the code cell language. You can see available snippets by typing Snippet or any keywords appear in the snippet title in the code cell editor. For example, by typing read you can see the list of snippets to read data from various data sources. +Synapse notebooks provide code snippets that make it easier to enter commonly used code patterns. These patterns include configuring your Spark session, reading data as a Spark DataFrame, and drawing charts by using Matplotlib. -![Animated GIF of Synapse code snippets](./media/apache-spark-development-using-notebooks/synapse-code-snippets.gif#lightbox) +Snippets appear in [shortcut keys of IDE-style IntelliSense](#ide-style-intellisense) mixed with other suggestions. The code snippets' contents align with the code cell language. You can see available snippets by entering snippet or any keywords that appear in the snippet title in the code cell editor. For example, by entering read, you can see the list of snippets to read data from various data sources. +![Animated GIF of code snippets in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-code-snippets.gif#lightbox) -<h3 id="format-text-cell-with-toolbar-buttons">Format text cell with toolbar buttons</h3> +### <a name = "format-text-cell-with-toolbar-buttons"></a>Format text cells by using toolbar buttons -You can use the format buttons in the text cells toolbar to do common markdown actions. It includes bolding text, italicizing text, paragraph/headers through a dropdown, inserting code, inserting unordered list, inserting ordered list, inserting hyperlink and inserting image from URL. +You can use the format buttons on the text cell toolbar to do common Markdown actions. These actions include making text bold, making text italic, creating paragraphs and headings through a dropdown menu, inserting code, inserting an unordered list, inserting an ordered list, inserting a hyperlink, and inserting an image from a URL. - ![Screenshot of Synapse text cell toolbar](./media/apache-spark-development-using-notebooks/synapse-text-cell-toolbar-preview.png) +![Screenshot of the text cell toolbar in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-text-cell-toolbar-preview.png) - +### <a name = "undo-redo-cell-operation"></a>Undo or redo a cell operation -<h3 id="undo-redo-cell-operation">Undo/Redo cell operation</h3> +To revoke the most recent cell operations, select the Undo or Redo button, or select the Z key or Shift+Z. Now you can undo or redo up to 10 historical cell operations. -Select the Undo / Redo button or press Z / Shift+Z to revoke the most recent cell operations. Now you can undo/redo up to the latest 10 historical cell operations. +![Screenshot of the Undo and Redo buttons in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-undo-cells-aznb.png) - ![Screenshot of Synapse undo cells of aznb](./media/apache-spark-development-using-notebooks/synapse-undo-cells-aznb.png) +Supported cell operations include: -Supported undo cell operations: -+ Insert/Delete cell: You could revoke the delete operations by selecting Undo, the text content is kept along with the cell. -+ Reorder cell. -+ Toggle parameter. -+ Convert between Code cell and Markdown cell. +* Insert or delete a cell. You can revoke delete operations by selecting Undo. This action keeps the text content along with the cell. +* Reorder cells. +* Turn a parameter cell on or off. +* Convert between a code cell and a Markdown cell. > [!NOTE] -> In-cell text operations and code cell commenting operations are not undoable. -> Now you can undo/redo up to the latest 10 historical cell operations. +> You can't undo text operations or commenting operations in a cell. +### <a name = "code-cell-commenting"></a>Comment on a code cell - +1. Select the Comments button on the notebook toolbar to open the Comments pane. -<h3 id="Code-cell-commenting">Code cell commenting</h3> + ![Screenshot of the Comments button and the Comments pane in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-comments-button.png) -1. Select Comments button on the notebook toolbar to open Comments pane. +2. Select code in the code cell, select New on the Comments pane, add comments, and then select the Post comment button. - ![Screenshot of Synapse comment button](./media/apache-spark-development-using-notebooks/synapse-comments-button.png) + ![Screenshot of the box for entering a comment in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-new-comments.png) -2. Select code in the code cell, click New in the Comments pane, add comments then click Post comment button to save. +3. If necessary, you can perform Edit comment, Resolve thread, and Delete thread actions by selecting the More ellipsis (...) beside your comment. - ![Screenshot of Synapse new comment](./media/apache-spark-development-using-notebooks/synapse-new-comments.png) + ![Screenshot of additional commands for a code cell in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-edit-comments.png) -3. You could perform Edit comment, Resolve thread, or Delete thread by clicking the More button besides your comment. +### <a name = "move-a-cell"></a>Move a cell - ![Screenshot of Synapse edit comment](./media/apache-spark-development-using-notebooks/synapse-edit-comments.png) +To move a cell, select the left side of the cell and drag the cell to the desired position. - +![Animated GIF of moving cells in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-drag-drop-cell.gif) -<h3 id="move-a-cell">Move a cell</h3> +### <a name = "delete-a-cell"></a>Delete a cell -Click on the left-hand side of a cell and drag it to the desired position. - ![Animated GIF of Synapse move cells](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-drag-drop-cell.gif) +To delete a cell, select the Delete button to the right of the cell. -- -<h3 id="delete-a-cell">Delete a cell</h3> +You can also use [shortcut keys in command mode](#shortcut-keys-in-command-mode). Select Shift+D to delete the current cell. -To delete a cell, select the delete button at the right hand of the cell. +![Screenshot of the button for deleting a cell in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-delete-cell.png) -You can also use [shortcut keys under command mode](#shortcut-keys-under-command-mode). Press Shift+D to delete the current cell. +### <a name = "collapse-a-cell-input"></a>Collapse cell input - ![Screenshot of azure-notebook-delete-a-cell](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-delete-cell.png) +To collapse the current cell's input, select the More commands ellipsis (...) on the cell toolbar, and then select Hide input. To expand the input, select Show input while the cell is collapsed. - +![Animated GIF of collapsing and expanding cell input in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-collapse-cell-input.gif) -<h3 id="collapse-a-cell-input">Collapse a cell input</h3> +### <a name = "collapse-a-cell-output"></a>Collapse cell output -Select the More commands ellipses (...) on the cell toolbar and Hide input to collapse current cell's input. To expand it, Select the Show input while the cell is collapsed. +To collapse the current cell's output, select the More commands ellipsis (...) on the cell toolbar, and then select Hide output. To expand the output, select Show output while the cell's output is hidden. - ![Animated GIF of azure-notebook-collapse-cell-input](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-collapse-cell-input.gif) -- +![Animated GIF of collapsing and expanding cell output in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-collapse-cell-output.gif) -<h3 id="collapse-a-cell-output">Collapse a cell output</h3> +### <a name = "notebook-outline"></a>Use a notebook outline -Select the More commands ellipses (...) on the cell toolbar and Hide output to collapse current cell's output. To expand it, select the Show output while the cell's output is hidden. - - ![Animated GIF of azure-notebook-collapse-cell-output](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-collapse-cell-output.gif) --- +The outline (table of contents) presents the first Markdown header of any Markdown cell in a sidebar window for quick navigation. The outline sidebar is resizable and collapsible to fit the screen in the best way possible. To open or hide the sidebar, select the Outline button on the notebook command bar. -<h3 id="notebook-outline">Notebook outline</h3> -The Outlines (Table of Contents) presents the first markdown header of any markdown cell in a sidebar window for quick navigation. The Outlines sidebar is resizable and collapsible to fit the screen in the best ways possible. You can select the Outline button on the notebook command bar to open or hide sidebar +## Run a notebook -![Screenshot of azure-notebook-outline](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-outline.png) +You can run the code cells in your notebook individually or all at once. The status and progress of each cell appears in the notebook. --- -## Run notebooks - -You can run the code cells in your notebook individually or all at once. The status and progress of each cell is represented in the notebook. - -> [!NOTE] -> Deleting a notebook will not automatically cancel any jobs that are currently running. If you need to cancel a job, you should visit the Monitoring Hub and cancel it manually. +> [!NOTE] +> Deleting a notebook doesn't automatically cancel any jobs that are currently running. If you need to cancel a job, go to the Monitor hub and cancel it manually. ### Run a cell -There are several ways to run the code in a cell. +There are multiple ways to run the code in a cell: -1. Hover on the cell you want to run and select the Run Cell button or press Ctrl+Enter. +* Hover over the cell that you want to run, and then select the Run cell button or select Ctrl+Enter. - ![Screenshot of run-cell-1](./media/apache-spark-development-using-notebooks/synapse-run-cell.png) + ![Screenshot of the command for running a cell in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-run-cell.png) -2. Use [Shortcut keys under command mode](#shortcut-keys-under-command-mode). Press Shift+Enter to run the current cell and select the cell below. Press Alt+Enter to run the current cell and insert a new cell below. -- +* Use [shortcut keys in command mode](#shortcut-keys-in-command-mode). Select Shift+Enter to run the current cell and select the cell below it. Select Alt+Enter to run the current cell and insert a new cell below it. ### Run all cells -Select the Run All button to run all the cells in current notebook in sequence. - ![Screenshot of run-all-cells](./media/apache-spark-development-using-notebooks/synapse-run-all.png) +To run all the cells in the current notebook in sequence, select the Run all button. +![Screenshot of the button for running all cells in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-run-all.png) ### Run all cells above or below -Expand the dropdown list from Run all button, then select Run cells above to run all the cells above the current in sequence. Select Run cells below to run all the cells below the current in sequence. - - ![Screenshot of azure-notebook-run-cells-above-or-below](./media/apache-spark-development-using-notebooks/synapse-aznb-run-cells-above-or-below.png) +To run all the cells above the current cell in sequence, expand the dropdown list for the Run all button, and then select Run cells above. Select Run cells below to run all the cells below the current one in sequence. - +![Screenshot of the commands for running cells above and below the current cell in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-aznb-run-cells-above-or-below.png) ### Cancel all running cells -Select the Cancel All button to cancel the running cells or cells waiting in the queue. - ![Screenshot of azure-notebook-cancel-all-cells](./media/apache-spark-development-using-notebooks/synapse-aznb-cancel-all.png) +To cancel the running cells or the cells waiting in the queue, select the Cancel all button. - +![Screenshot of the button for canceling all running or waiting cells in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-aznb-cancel-all.png) -### Notebook reference +### <a name = "notebook-reference"></a>Reference a notebook -You can use ```%run <notebook path>``` magic command to reference another notebook within current notebook's context. All the variables defined in the reference notebook are available in the current notebook. ```%run``` magic command supports nested calls but not support recursive calls. You receive an exception if the statement depth is larger than five. +To reference another notebook within the current notebook's context, use the `%run <notebook path>` magic command. All the variables defined in the reference notebook are available in the current notebook. -Example: -``` %run /<path>/Notebook1 { "parameterInt": 1, "parameterFloat": 2.5, "parameterBool": true, "parameterString": "abc" } ```. +Here's an example: -Notebook reference works in both interactive mode and Synapse pipeline. +``` %run /<path>/Notebook1 { "parameterInt": 1, "parameterFloat": 2.5, "parameterBool": true, "parameterString": "abc" } ``` -> [!NOTE] -> - ```%run``` command currently only supports to pass a absolute path or notebook name only as parameter, relative path is not supported. -> - ```%run``` command currently only supports to 4 parameter value types: `int`, `float`, `bool`, `string`, variable replacement operation is not supported. -> - The referenced notebooks are required to be published. You need to publish the notebooks to reference them unless [Reference unpublished notebook](#reference-unpublished-notebook) is enabled. Synapse Studio does not recognize the unpublished notebooks from the Git repo. -> - Referenced notebooks do not support statement that depth is larger than five. -> +The notebook reference works in both interactive mode and pipelines. - +The `%run` magic command has these limitations: + +* The command supports nested calls but not recursive calls. +* The command supports passing an absolute path or notebook name only as a parameter. It doesn't support relative paths. +* The command currently supports only four parameter value types: `int`, `float`, `bool`, and `string`. It doesn't support variable replacement operations. +* The referenced notebooks must be published. You need to publish the notebooks to reference them, unless you select the [option to enable an unpublished notebook reference](#reference-unpublished-notebook). Synapse Studio does not recognize the unpublished notebooks from the Git repo. +* Referenced notebooks don't support statement depths larger than five. -### Variable explorer +### Use the variable explorer -Synapse notebook provides a built-in variables explorer for you to see the list of the variables name, type, length, and value in the current Spark session for PySpark (Python) cells. More variables show up automatically as they are defined in the code cells. Clicking on each column header sorts the variables in the table. +A Synapse notebook provides a built-in variable explorer in the form of a table that lists variables in the current Spark session for PySpark (Python) cells. The table includes columns for variable name, type, length, and value. More variables appear automatically as they're defined in the code cells. Selecting each column header sorts the variables in the table. -You can select the Variables button on the notebook command bar to open or hide the variable explorer. +To open or hide the variable explorer, select the Variables button on the notebook command bar. -![Screenshot of azure-notebook-variable-explorer](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-variable-explorer.png) +![Screenshot of the variable explorer in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-variable-explorer.png) > [!NOTE] -> Variable explorer only supports python. +> The variable explorer supports only Python. - +### Use the cell status indicator + +A step-by-step status of a cell run appears beneath the cell to help you see its current progress. After the cell run finishes, a summary with the total duration and end time appears and stays there for future reference. -### Cell status indicator +![Screenshot of the summary of a cell run in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-cell-status.png) -A step-by-step cell execution status is displayed beneath the cell to help you see its current progress. Once the cell run is complete, an execution summary with the total duration and end time is shown and kept there for future reference. +### Use the Spark progress indicator -![Screenshot of cell-status](./media/apache-spark-development-using-notebooks/synapse-cell-status.png) +A Synapse notebook is purely Spark based. Code cells run on the serverless Apache Spark pool remotely. A Spark job progress indicator with a real-time progress bar helps you understand the job run status. -### Spark progress indicator +The number of tasks for each job or stage helps you identify the parallel level of your Spark job. You can also drill deeper to the Spark UI of a specific job (or stage) by selecting the link on the job (or stage) name. -Synapse notebook is purely Spark based. Code cells are executed on the serverless Apache Spark pool remotely. A Spark job progress indicator is provided with a real-time progress bar appears to help you understand the job execution status. -The number of tasks per each job or stage help you to identify the parallel level of your spark job. You can also drill deeper to the Spark UI of a specific job (or stage) via selecting the link on the job (or stage) name. +![Screenshot of the Spark progress indicator in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-spark-progress-indicator.png) +### <a name = "spark-session-configuration"></a>Configure a Spark session -![Screenshot of spark-progress-indicator](./media/apache-spark-development-using-notebooks/synapse-spark-progress-indicator.png) +On the Configure session pane, you can specify the timeout duration, the number of executors, and the size of executors to give to the current Spark session. Restart the Spark session for configuration changes to take effect. All cached notebook variables are cleared. -### Spark session configuration +You can also create a configuration from the Apache Spark configuration or select an existing configuration. For details, refer to [Manage Apache Spark configuration](../../synapse-analytics/spark/apache-spark-azure-create-spark-configuration.md). -You can specify the timeout duration, the number, and the size of executors to give to the current Spark session in Configure session. Restart the Spark session is for configuration changes to take effect. All cached notebook variables are cleared. -You can also create a configuration from the Apache Spark configuration or select an existing configuration. For details, please refer to [Apache Spark Configuration Management](../../synapse-analytics/spark/apache-spark-azure-create-spark-configuration.md). +#### Magic command for configuring a Spark session -[![Screenshot of session-management](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-spark-session-management.png)](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-spark-session-management.png#lightbox) +You can also specify Spark session settings via the magic command `%%configure`. To make the settings take effect, restart the Spark session. -#### Spark session configuration magic command -You can also specify spark session settings via a magic command %%configure. The spark session needs to restart to make the settings effect. We recommend you to run the %%configure at the beginning of your notebook. Here is a sample, refer to https://github.com/cloudera/livy#request-body for full list of valid parameters. +We recommend that you run `%%configure` at the beginning of your notebook. Here's a sample. For the full list of valid parameters, refer to the [Livy information on GitHub](https://github.com/cloudera/livy#request-body). ```json %%configure { - //You can get a list of valid parameters to config the session from https://github.com/cloudera/livy#request-body. + //You can get a list of valid parameters to configure the session from https://github.com/cloudera/livy#request-body. "driverMemory":"28g", // Recommended values: ["28g", "56g", "112g", "224g", "400g", "472g"] "driverCores":4, // Recommended values: [4, 8, 16, 32, 64, 80] "executorMemory":"28g", "executorCores":4, "jars":["abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/myjar.jar","wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>/myjar1.jar"], "conf":{ - //Example of standard spark property, to find more available properties please visit:https://spark.apache.org/docs/latest/configuration.html#application-properties. + //Example of a standard Spark property. To find more available properties, go to https://spark.apache.org/docs/latest/configuration.html#application-properties. "spark.driver.maxResultSize":"10g", - //Example of customized property, you can specify count of lines that Spark SQL returns by configuring "livy.rsc.sql.num-rows". + //Example of a customized property. You can specify the count of lines that Spark SQL returns by configuring "livy.rsc.sql.num-rows". "livy.rsc.sql.num-rows":"3000" } }+ ``` -> [!NOTE] -> - "DriverMemory" and "ExecutorMemory" are recommended to set as same value in %%configure, so do "driverCores" and "executorCores". -> - You can use %%configure in Synapse pipelines, but if it's not set in the first code cell, the pipeline run will fail due to cannot restart session. -> - The %%configure used in mssparkutils.notebook.run is going to be ignored but used in %run notebook will continue executing. -> - The standard Spark configuration properties must be used in the "conf" body. We do not support first level reference for the Spark configuration properties. -> - Some special spark properties including "spark.driver.cores", "spark.executor.cores", "spark.driver.memory", "spark.executor.memory", "spark.executor.instances" won't take effect in "conf" body. -> +Here are some considerations for the `%%configure` magic command: -#### Parameterized session configuration from pipeline +* We recommend that you use the same value for `driverMemory` and `executorMemory` in `%%configure`. We also recommend that `driverCores` and `executorCores` have the same value. +* You can use `%%configure` in Synapse pipelines, but if you don't set it in the first code cell, the pipeline run will fail because it can't restart the session. +* The `%%configure` command used in `mssparkutils.notebook.run` is ignored, but the command used in `%run <notebook>` continues to run. +* You must use the standard Spark configuration properties in the `"conf"` body. We don't support first-level references for the Spark configuration properties. +* Some special Spark properties won't take effect in `"conf"` body, including `"spark.driver.cores"`, `"spark.executor.cores"`, `"spark.driver.memory"`, `"spark.executor.memory"`, and `"spark.executor.instances"`. -Parameterized session configuration allows you to replace the value in %%configure magic with Pipeline run (Notebook activity) parameters. When preparing %%configure code cell, you can override default values (also configurable, 4 and "2000" in the below example) with an object like this: +#### <a name = "parameterized-session-configuration-from-pipeline"></a>Parameterized session configuration from a pipeline + +You can use parameterized session configuration to replace values in the `%%configure` magic command with pipeline run (notebook activity) parameters. When you're preparing a `%%configure` code cell, you can override default values by using an object like this: ``` { Parameterized session configuration allows you to replace the value in %%configu } ``` +The following example shows default values of `4` and `"2000"`, which are also configurable: + ```python %%configure Parameterized session configuration allows you to replace the value in %%configu } ``` -Notebook uses default value if run a notebook in interactive mode directly or no parameter that match "activityParameterName" is given from Pipeline Notebook activity. +The notebook uses the default value if you run the notebook in interactive mode directly or if the pipeline notebook activity doesn't provide a parameter that matches `"activityParameterName"`. -During the pipeline run mode, you can configure pipeline Notebook activity settings as below: -![Screenshot of parameterized session configuration](./media/apache-spark-development-using-notebooks/parameterized-session-config.png) +During the pipeline run mode, you can use the Settings tab to configure settings for a pipeline notebook activity. -If you want to change the session configuration, pipeline Notebook activity parameters name should be same as activityParameterName in the notebook. When running this pipeline, in this example driverCores in %%configure will be replaced by 8 and livy.rsc.sql.num-rows will be replaced by 4000. +![Screenshot of parameterized session configuration in a Synapse notebook.](./media/apache-spark-development-using-notebooks/parameterized-session-config.png) -> [!NOTE] -> If run pipeline failed because of using this new %%configure magic, you can check more error information by running %%configure magic cell in the interactive mode of the notebook. -> +If you want to change the session configuration, the name of the pipeline notebook activity parameter should be the same as `activityParameterName` in the notebook. In this example, during a pipeline run, `8` replaces `driverCores` in `%%configure`, and `4000` replaces `livy.rsc.sql.num-rows`. +If a pipeline run fails after you use the `%%configure` magic command, you can get more error information by running the `%%configure` magic cell in the interactive mode of the notebook. ## Bring data to a notebook -You can load data from Azure Blob Storage, Azure Data Lake Store Gen 2, and SQL pool as shown in the code samples below. +You can load data from Azure Data Lake Storage Gen 2, Azure Blob Storage, and SQL pools, as shown in the following code samples. -### Read a CSV from Azure Data Lake Store Gen2 as a Spark DataFrame +### Read a CSV file from Azure Data Lake Storage Gen2 as a Spark DataFrame ```python from pyspark.sql import SparkSession df1 = spark.read.option('header', 'true') \ ``` -#### Read a CSV from Azure Blob Storage as a Spark DataFrame +### Read a CSV file from Azure Blob Storage as a Spark DataFrame ```python linked_service_name = 'Your linked service name' # replace with your linked serv blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name) -# Allow SPARK to access from Blob remotely +# Allow Spark to access from Azure Blob Storage remotely wasb_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path) df = spark.read.option("header", "true") \ ### Read data from the primary storage account -You can access data in the primary storage account directly. There's no need to provide the secret keys. In Data Explorer, right-click on a file and select New notebook to see a new notebook with data extractor autogenerated. +You can access data in the primary storage account directly. There's no need to provide the secret keys. In Data Explorer, right-click a file and select New notebook to see a new notebook with an automatically generated data extractor. -![Screenshot of data-to-cell](./media/apache-spark-development-using-notebooks/synapse-data-to-cell.png) +## Use IPython widgets -## IPython Widgets +Widgets are eventful Python objects that have a representation in the browser, often as a control like a slider or text box. IPython widgets work only in Python environments. They're currently not supported in other languages (for example, Scala, SQL, or C#). -Widgets are eventful Python objects that have a representation in the browser, often as a control like a slider, textbox etc. IPython Widgets only works in Python environment, it's not supported in other languages (for example, Scala, SQL, C#) yet. +### Steps to use IPython widgets + +1. Import the `ipywidgets` module to use the Jupyter Widgets framework: -### To use IPython Widget -1. You need to import `ipywidgets` module first to use the Jupyter Widget framework. ```python import ipywidgets as widgets ``` -2. You can use top-level `display` function to render a widget, or leave an expression of widget type at the last line of code cell. + +2. Use the top-level `display` function to render a widget, or leave an expression of `widget` type at the last line of the code cell: + ```python slider = widgets.IntSlider() display(slider) Widgets are eventful Python objects that have a representation in the browser, o slider = widgets.IntSlider() slider ``` - -3. Run the cell, the widget displays at the output area. - ![Screenshot of ipython widgets slider](./media/apache-spark-development-using-notebooks/ipython-widgets-slider.png) +3. Run the cell. The widget appears at the output area. -4. You can use multiple `display()` calls to render the same widget instance multiple times, but they remain in sync with each other. + ![Screenshot of an IPython widget slider in a Synapse notebook.](./media/apache-spark-development-using-notebooks/ipython-widgets-slider.png) - ```python - slider = widgets.IntSlider() - display(slider) - display(slider) - ``` +You can use multiple `display()` calls to render the same widget instance multiple times, but they remain in sync with each other: - ![Screenshot of ipython widgets sliders](./media/apache-spark-development-using-notebooks/ipython-widgets-multiple-sliders.png) +```python +slider = widgets.IntSlider() +display(slider) +display(slider) +``` -5. To render two widgets independent of each other, create two widget instances: +![Screenshot of multiple IPython widget sliders in a Synapse notebook.](./media/apache-spark-development-using-notebooks/ipython-widgets-multiple-sliders.png) - ```python - slider1 = widgets.IntSlider() - slider2 = widgets.IntSlider() - display(slider1) - display(slider2) - ``` +To render two widgets that are independent of each other, create two widget instances: +```python +slider1 = widgets.IntSlider() +slider2 = widgets.IntSlider() +display(slider1) +display(slider2) +``` ### Supported widgets -\|Widgets Type\|Widgets\| +\|Widget type\|Widgets\| \|--\|--\| -\|Numeric widgets\|IntSlider, FloatSlider, FloatLogSlider, IntRangeSlider, FloatRangeSlider, IntProgress, FloatProgress, BoundedIntText, BoundedFloatText, IntText, FloatText\| -\|Boolean widgets\|ToggleButton, Checkbox, Valid\| -\|Selection widgets\|Dropdown, RadioButtons, Select, SelectionSlider, SelectionRangeSlider, ToggleButtons, SelectMultiple\| -\|String Widgets\|Text, Text area, Combobox, Password, Label, HTML, HTML Math, Image, Button\| -\|Play (Animation) widgets\|Date picker, Color picker, Controller\| -\|Container/Layout widgets\|Box, HBox, VBox, GridBox, Accordion, Tabs, Stacked\| - +\|Numeric\|`IntSlider`, `FloatSlider`, `FloatLogSlider`, `IntRangeSlider`, `FloatRangeSlider`, `IntProgress`, `FloatProgress`, `BoundedIntText`, `BoundedFloatText`, `IntText`, `FloatText`\| +\|Boolean\|`ToggleButton`, `Checkbox`, `Valid`\| +\|Selection\|`Dropdown`, `RadioButtons`, `Select`, `SelectionSlider`, `SelectionRangeSlider`, `ToggleButtons`, `SelectMultiple`\| +\|String\|`Text`, `Text area`, `Combobox`, `Password`, `Label`, `HTML`, `HTML Math`, `Image`, `Button`\| +\|Play (animation)\|`Date picker`, `Color picker`, `Controller`\| +\|Container/Layout\|`Box`, `HBox`, `VBox`, `GridBox`, `Accordion`, `Tabs`, `Stacked`\| ### Known limitations -1. The following widgets are not supported yet, you could follow the corresponding workaround as below: +* The following table lists widgets that aren't currently supported, along with workarounds: \|Functionality\|Workaround\| \|--\|--\| - \|`Output` widget\|You can use `print()` function instead to write text into stdout.\| - \|`widgets.jslink()`\|You can use `widgets.link()` function to link two similar widgets.\| - \|`FileUpload` widget\| Not support yet.\| + \|`Output` widget\|You can use the `print()` function instead to write text into `stdout`.\| + \|`widgets.jslink()`\|You can use the `widgets.link()` function to link two similar widgets.\| + \|`FileUpload` widget\| None available.\| -2. Global `display` function provided by Synapse does not support displaying multiple widgets in one call (that is, `display(a, b)`), which is different from IPython `display` function. -3. If you close a notebook that contains IPython Widget, you will not be able to see or interact with it until you execute the corresponding cell again. +* The global `display` function that Azure Synapse Analytics provides doesn't support displaying multiple widgets in one call (that is, `display(a, b)`). This behavior is different from the IPython `display` function. +* If you close a notebook that contains an IPython widget, you can't view or interact with the widget until you run the corresponding cell again. - ## Save notebooks -You can save a single notebook or all notebooks in your workspace. +You can save a single notebook or all notebooks in your workspace: -1. To save changes you made to a single notebook, select the Publish button on the notebook command bar. +* To save changes that you made to a single notebook, select the Publish button on the notebook command bar. - ![Screenshot of publish-notebook](./media/apache-spark-development-using-notebooks/synapse-publish-notebook.png) + ![Screenshot of the button for publishing changes to a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-publish-notebook.png) -2. To save all notebooks in your workspace, select the Publish all button on the workspace command bar. +* To save all notebooks in your workspace, select the Publish all button on the workspace command bar. - ![Screenshot of publish-all](./media/apache-spark-development-using-notebooks/synapse-publish-all.png) + ![Screenshot of the button for publishing changes to all notebooks in a Synapse workspace.](./media/apache-spark-development-using-notebooks/synapse-publish-all.png) -In the notebook properties, you can configure whether to include the cell output when saving. +On the notebook's Properties pane, you can configure whether to include the cell output when saving. - ![Screenshot of notebook-properties](./media/apache-spark-development-using-notebooks/synapse-notebook-properties.png) +![Screenshot of Synapse notebook properties and the checkbox for including cell output when saving.](./media/apache-spark-development-using-notebooks/synapse-notebook-properties.png) -<!-- -## Export a notebook -You can Export your notebook to other standard formats.ΓÇ»Synapse notebook supports toΓÇ»be exported into: +## <a name = "magic-commands"></a>Use magic commands -+ Standard Notebook file(.ipynb) that is usually used for Jupyter notebooks. -+ HTML file(.html) that can be opened from browser directly. -+ Python file(.py). -+ Latex file(.tex).ΓÇ» - - ![Screenshot of notebook-export](./media/apache-spark-development-using-notebooks/synapse-notebook-export.png) ->- -## Magic commands -You can use familiar Jupyter magic commands in Synapse notebooks. Review the following list as the current available magic commands. Tell us [your use cases on GitHub](https://github.com/MicrosoftDocs/azure-docs/issues/new) so that we can continue to build out more magic commands to meet your needs. +You can use familiar Jupyter magic commands in Synapse notebooks. Review the following lists of currently available magic commands. Tell us [your use cases on GitHub](https://github.com/MicrosoftDocs/azure-docs/issues/new) so that we can continue to create more magic commands to meet your needs. > [!NOTE] -> Only following magic commands are supported in Synapse pipeline : %%pyspark, %%spark, %%csharp, %%sql. -> -- -Available line magics: -[%lsmagic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-lsmagic), [%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%history](#view-the-history-of-input-commands), [%run](#notebook-reference), [%load](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-load) - -Available cell magics: -[%%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%%capture](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture), [%%writefile](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile), [%%sql](#use-multiple-languages), [%%pyspark](#use-multiple-languages), [%%spark](#use-multiple-languages), [%%csharp](#use-multiple-languages), [%%html](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-html), [%%configure](#spark-session-configuration-magic-command) - - -- -<h2 id="reference-unpublished-notebook">Reference unpublished notebook</h2> +> Only the following magic commands are supported in Synapse pipelines: `%%pyspark`, `%%spark`, `%%csharp`, `%%sql`. +Available magic commands for lines: -Reference unpublished notebook is helpful when you want to debug "locally", when enabling this feature, notebook run fetches the current content in web cache, if you run a cell including a reference notebooks statement, you reference the presenting notebooks in the current notebook browser instead of a saved versions in cluster, that means the changes in your notebook editor can be referenced immediately by other notebooks without having to be published(Live mode) or committed(Git mode), by leveraging this approach you can easily avoid common libraries getting polluted during developing or debugging process. +[`%lsmagic`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-lsmagic), [`%time`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [`%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [`%history`](#view-the-history-of-input-commands), [`%run`](#reference-a-notebook), [`%load`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-load) -You can enable Reference unpublished notebook from Properties panel: +Available magic commands for cells: - ![Screenshot of notebook-reference](./media/apache-spark-development-using-notebooks/synapse-notebook-reference.png) +[`%%time`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [`%%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [`%%capture`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture), [`%%writefile`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile), [`%%sql`](#use-multiple-languages), [`%%pyspark`](#use-multiple-languages), [`%%spark`](#use-multiple-languages), [`%%csharp`](#use-multiple-languages), [`%%html`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-html), [`%%configure`](#magic-command-for-configuring-a-spark-session) +## <a name = "reference-unpublished-notebook"></a>Reference an unpublished notebook -For different cases comparison please check the table below: +Referencing an unpublished notebook is helpful when you want to debug locally. When you enable this feature, a notebook run fetches the current content in web cache. If you run a cell that includes a reference notebook statement, you reference the presenting notebooks in the current notebook browser instead of a saved version in a cluster. Other notebooks can reference the changes in your notebook editor without requiring you to publish (Live mode) or commit (Git mode) the changes. By using this approach, you can prevent the pollution of common libraries during the developing or debugging process. -Notice that [%run](./apache-spark-development-using-notebooks.md) and [mssparkutils.notebook.run](./microsoft-spark-utilities.md) has same behavior here. We use `%run` here as an example. +You can enable referencing an unpublished notebook by selecting the appropriate checkbox on the Properties pane. -\|Case\|Disable\|Enable\| -\|-\|-\|\| -\|Live Mode\|\|\| -\|- Nb1 (Published) <br/> `%run Nb1`\|Run published version of Nb1\|Run published version of Nb1\| -\|- Nb1 (New) <br/> `%run Nb1`\|Error\|Run new Nb1\| -\|- Nb1 (Previously published, edited) <br/> `%run Nb1`\|Run published version of Nb1\|Run edited version of Nb1\| -\|Git Mode\|\|\| -\|- Nb1 (Published) <br/> `%run Nb1`\|Run published version of Nb1\|Run published version of Nb1\| -\|- Nb1 (New) <br/> `%run Nb1`\|Error\|Run new Nb1\| -\|- Nb1 (Not published, committed) <br/> `%run Nb1`\|Error\|Run committed Nb1\| -\|- Nb1 (Previously published, committed) <br/> `%run Nb1`\|Run published version of Nb1\|Run committed version of Nb1\| -\|- Nb1 (Previously published, new in current branch) <br/> `%run Nb1`\|Run published version of Nb1\|Run new Nb1\| -\|- Nb1 (Not published, previously committed, edited) <br/> `%run Nb1`\|Error\|Run edited version of Nb1\| -\|- Nb1 (Previously published and committed, edited) <br/> `%run Nb1`\|Run published version of Nb1\|Run edited version of Nb1\| +![Screenshot of the checkbox for enabling referencing an unpublished Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-notebook-reference.png) - -### Conclusion +The following table compares cases. Although [%run](./apache-spark-development-using-notebooks.md) and [mssparkutils.notebook.run](./microsoft-spark-utilities.md) have the same behavior here, the table uses `%run` as an example. -* If disabled, always run published version. -* If enabled, the reference run will always adopt the current version of notebook you can see from the notebook UX. +\|Case\|Disable\|Enable\| +\|-\|-\|\| +\|Live mode\|\|\| +\|Nb1 (published) <br/> `%run Nb1`\|Run published version of Nb1\|Run published version of Nb1\| +\|Nb1 (new) <br/> `%run Nb1`\|Error\|Run new Nb1\| +\|Nb1 (previously published, edited) <br/> `%run Nb1`\|Run published version of Nb1\|Run edited version of Nb1\| +\|Git mode\|\|\| +\|Nb1 (published) <br/> `%run Nb1`\|Run published version of Nb1\|Run published version of Nb1\| +\|Nb1 (new) <br/> `%run Nb1`\|Error\|Run new Nb1\| +\|Nb1 (not published, committed) <br/> `%run Nb1`\|Error\|Run committed Nb1\| +\|Nb1 (previously published, committed) <br/> `%run Nb1`\|Run published version of Nb1\|Run committed version of Nb1\| +\|Nb1 (previously published, new in current branch) <br/> `%run Nb1`\|Run published version of Nb1\|Run new Nb1\| +\|Nb1 (not published, previously committed, edited) <br/> `%run Nb1`\|Error\|Run edited version of Nb1\| +\|Nb1 (previously published and committed, edited) <br/> `%run Nb1`\|Run published version of Nb1\|Run edited version of Nb1\| +In summary: -## Active session management +* If you disable referencing an unpublished notebook, always run the published version. +* If you enable referencing an unpublished notebook, the reference run always adopts the current version of notebook that appears on the notebook UX. -You can reuse your notebook sessions conveniently now without having to start new ones. Synapse notebook now supports managing your active sessions in the Manage sessions list, you can see all the sessions in the current workspace started by you from notebook. +## Manage active sessions - ![Screenshot of notebook-manage-sessions](./media/apache-spark-development-using-notebooks/synapse-notebook-manage-sessions.png) +You can reuse your notebook sessions without having to start new ones. In Synapse notebooks, you can manage your active sessions in a single list. To open the list, select the ellipsis (...), and then select Manage sessions. -In the Active sessions, list you can see the session information and the corresponding notebook that is currently attached to the session. You can operate Detach with notebook, Stop the session, and View in monitoring from here. Moreover, you can easily connect your selected notebook to an active session in the list started from another notebook, the session is detached from the previous notebook (if it's not idle) then attach to the current one. +![Screenshot of selections for opening a list of active Synapse notebook sessions.](./media/apache-spark-development-using-notebooks/synapse-notebook-manage-sessions.png) - ![Screenshot of notebook-sessions-list](./media/apache-spark-development-using-notebooks/synapse-notebook-sessions-list.png) +The Active sessions pane lists all the sessions in the current workspace that you started from a notebook. The list shows the session information and the corresponding notebooks. The Detach with notebook, Stop the session, and View in monitoring actions are available here. Also, you can connect your selected notebook to an active session that started from another notebook. The session is then detached from the previous notebook (if it's not idle) and attached to the current one. +![Screenshot of the pane for active sessions in a Synapse workspace.](./media/apache-spark-development-using-notebooks/synapse-notebook-sessions-list.png) -## Python logging in Notebook +## Use Python logs in a notebook -You can find Python logs and set different log levels and format following the sample code below: +You can find Python logs and set different log levels and formats by using the following sample code: ```python import logging formatter = logging.Formatter(fmt=FORMAT) for handler in logging.getLogger().handlers: handler.setFormatter(formatter) -# Customize log level for all loggers +# Customize the log level for all loggers logging.getLogger().setLevel(logging.INFO) # Customize the log level for a specific logger customizedLogger = logging.getLogger('customized') customizedLogger.setLevel(logging.WARNING) -# logger that use the default global log level +# Logger that uses the default global log level defaultLogger = logging.getLogger('default') defaultLogger.debug("default debug message") defaultLogger.info("default info message") defaultLogger.warning("default warning message") defaultLogger.error("default error message") defaultLogger.critical("default critical message") -# logger that use the customized log level +# Logger that uses the customized log level customizedLogger.debug("customized debug message") customizedLogger.info("customized info message") customizedLogger.warning("customized warning message") customizedLogger.critical("customized critical message") ## View the history of input commands -Synapse notebook support magic command ```%history``` to print the input command history that executed in the current session, comparing to the standard Jupyter Ipython command the ```%history``` works for multiple languages context in notebook. +Synapse notebooks support the magic command `%history` to print the input command history for the current session. The `%history` magic command is similar to the standard Jupyter IPython command and works for multiple language contexts in a notebook. -``` %history [-n] [range [range ...]] ``` +`%history [-n] [range [range ...]]` -For options: -- -n: Print execution number. +In the preceding code, `-n` is the print execution number. The `range` value can be: -Where range can be: -- N: Print code of Nth executed cell.-- M-N: Print code from Mth to Nth executed cell. +* `N`: Print code of the `Nth` executed cell. +* `M-N`: Print code from the `Mth` to the `Nth` executed cell. -Example: -- Print input history from 1st to 2nd executed cell: ``` %history -n 1-2 ``` +For example, to print input history from the first to the second executed cell, use `%history -n 1-2`. ## Integrate a notebook ### Add a notebook to a pipeline -Select the Add to pipeline button on the upper right corner to add a notebook to an existing pipeline or create a new pipeline. +To add a notebook to an existing pipeline or to create a new pipeline, select the Add to pipeline button on the upper-right corner. -![Screenshot of Add notebook to pipeline](./media/apache-spark-development-using-notebooks/add-to-pipeline.png) +![Screenshot of the menu options for adding a notebook to an existing or new Synapse pipeline.](./media/apache-spark-development-using-notebooks/add-to-pipeline.png) -### Designate a parameters cell +### Designate a parameter cell -To parameterize your notebook, select the ellipses (...) to access the more commands at the cell toolbar. Then select Toggle parameter cell to designate the cell as the parameters cell. +To parameterize your notebook, select the ellipsis (...) to access more commands on the cell toolbar. Then select Toggle parameter cell to designate the cell as the parameter cell. -![Screenshot of azure-notebook-toggle-parameter](./media/apache-spark-development-using-notebooks/azure-notebook-toggle-parameter-cell.png) --- -Azure Data Factory looks for the parameters cell and treats this cell as defaults for the parameters passed in at execution time. The execution engine adds a new cell beneath the parameters cell with input parameters in order to overwrite the default values. +![Screenshot of the menu option for designating a cell as a parameter cell.](./media/apache-spark-development-using-notebooks/azure-notebook-toggle-parameter-cell.png) +Azure Data Factory looks for the parameter cell and treats this cell as the default for the parameters passed in at execution time. The execution engine adds a new cell beneath the parameter cell with input parameters to overwrite the default values. ### Assign parameters values from a pipeline -Once you've created a notebook with parameters, you can execute it from a pipeline with the Synapse Notebook activity. After you add the activity to your pipeline canvas, you will be able to set the parameters values under Base parameters section on the Settings tab. +After you create a notebook with parameters, you can run it from a pipeline by using a Synapse notebook activity. After you add the activity to your pipeline canvas, you can set the parameter values in the Base parameters section of the Settings tab. -![Screenshot of Assign a parameter](./media/apache-spark-development-using-notebooks/assign-parameter.png) +![Screenshot of the area for assigning parameters in Azure Synapse Analytics.](./media/apache-spark-development-using-notebooks/assign-parameter.png) -When assigning parameter values, you can use the [pipeline expression language](../../data-factory/control-flow-expression-language-functions.md) or [system variables](../../data-factory/control-flow-system-variables.md). +When you're assigning parameter values, you can use the [pipeline expression language](../../data-factory/control-flow-expression-language-functions.md) or [system variables](../../data-factory/control-flow-system-variables.md). +## Use shortcut keys +Similar to Jupyter Notebooks, Synapse notebooks have a modal user interface. The keyboard does different things, depending on which mode the notebook cell is in. Synapse notebooks support the following two modes for a code cell: -## Shortcut keys +* Command mode: A cell is in command mode when no text cursor is prompting you to type. When a cell is in command mode, you can edit the notebook as a whole but not type into individual cells. Enter command mode by selecting the Esc key or by using the mouse to select outside a cell's editor area. -Similar to Jupyter Notebooks, Synapse notebooks have a modal user interface. The keyboard does different things depending on which mode the notebook cell is in. Synapse notebooks support the following two modes for a given code cell: command mode and edit mode. + ![Screenshot of command mode in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-command-mode-2.png) -1. A cell is in command mode when there is no text cursor prompting you to type. When a cell is in Command mode, you can edit the notebook as a whole but not type into individual cells. Enter command mode by pressing `ESC` or using the mouse to select outside of a cell's editor area. +* Edit mode: When a cell is in edit mode, a text cursor prompts you to type in the cell. Enter edit mode by selecting the Enter key or by using the mouse to select a cell's editor area. - ![Screenshot of command-mode](./media/apache-spark-development-using-notebooks/synapse-command-mode-2.png) + ![Screenshot of edit mode in a Synapse notebook.](./media/apache-spark-development-using-notebooks/synapse-edit-mode-2.png) -2. Edit mode is indicated by a text cursor prompting you to type in the editor area. When a cell is in edit mode, you can type into the cell. Enter edit mode by pressing `Enter` or using the mouse to select on a cell's editor area. - - ![Screenshot of edit-mode](./media/apache-spark-development-using-notebooks/synapse-edit-mode-2.png) +### Shortcut keys in command mode -### Shortcut keys under command mode - -\| Action \|Synapse notebook Shortcuts \| +\| Action \|Synapse notebook shortcut \| \|--\|--\| \|Run the current cell and select below \| Shift+Enter \| \|Run the current cell and insert below \| Alt+Enter \| Similar to Jupyter Notebooks, Synapse notebooks have a modal user interface. Th \|Delete selected cells\| Shift+D \| \|Switch to edit mode\| Enter \| - +### Shortcut keys in edit mode -### Shortcut keys under edit mode -- -Using the following keystroke shortcuts, you can more easily navigate and run code in Synapse notebooks when in Edit mode. - -\| Action \|Synapse notebook shortcuts \| +\| Action \|Synapse notebook shortcut \| \|--\|--\| \|Move cursor up \| Up \| \|Move cursor down\|Down\| -\|Undo\|Ctrl + Z\| -\|Redo\|Ctrl + Y\| -\|Comment/Uncomment\|Ctrl + /\| -\|Delete word before\|Ctrl + Backspace\| -\|Delete word after\|Ctrl + Delete\| -\|Go to cell start\|Ctrl + Home\| -\|Go to cell end \|Ctrl + End\| -\|Go one word left\|Ctrl + Left\| -\|Go one word right\|Ctrl + Right\| -\|Select all\|Ctrl + A\| -\|Indent\| Ctrl +]\| -\|Dedent\|Ctrl + [\| +\|Undo\|Ctrl+Z\| +\|Redo\|Ctrl+Y\| +\|Comment/Uncomment\|Ctrl+/\| +\|Delete word before\|Ctrl+Backspace\| +\|Delete word after\|Ctrl+Delete\| +\|Go to cell start\|Ctrl+Home\| +\|Go to cell end \|Ctrl+End\| +\|Go one word left\|Ctrl+Left\| +\|Go one word right\|Ctrl+Right\| +\|Select all\|Ctrl+A\| +\|Indent\| Ctrl+]\| +\|Dedent\|Ctrl+[\| \|Switch to command mode\| Esc \| - +## Related content -## Next steps -- [Check out Synapse sample notebooks](https://github.com/Azure-Samples/Synapse/tree/master/Notebooks)-- [Quickstart: Create an Apache Spark pool in Azure Synapse Analytics using web tools](../quickstart-apache-spark-notebook.md)-- [What is Apache Spark in Azure Synapse Analytics](apache-spark-overview.md)-- [Use .NET for Apache Spark with Azure Synapse Analytics](spark-dotnet.md)-- [.NET for Apache Spark documentation](/previous-versions/dotnet/spark/what-is-apache-spark-dotnet)-- [Azure Synapse Analytics](../index.yml) +* [Synapse sample notebooks](https://github.com/Azure-Samples/Synapse/tree/master/Notebooks) +* [Quickstart: Create an Apache Spark pool in Azure Synapse Analytics using web tools](../quickstart-apache-spark-notebook.md) +* [What is Apache Spark in Azure Synapse Analytics?](apache-spark-overview.md) +* [Use .NET for Apache Spark with Azure Synapse Analytics](spark-dotnet.md) +* [.NET for Apache Spark documentation](/previous-versions/dotnet/spark/what-is-apache-spark-dotnet) +* [Azure Synapse Analytics documentation](../index.yml)
synapse-analytics	Author Sql Script	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/sql/author-sql-script.md	Title: SQL scripts in Synapse Studio -description: Introduction to Synapse Studio SQL scripts in Azure synapse Analytics. +description: Get an introduction to Synapse Studio SQL scripts in Azure Synapse Analytics. -+ Last updated 04/15/2020 + # Synapse Studio SQL scripts in Azure Synapse Analytics -Synapse Studio provides a SQL script web interface for you to author SQL queries. +Synapse Studio provides a web interface for SQL scripts where you can author SQL queries. + +## Begin authoring in a SQL script -## Begin authoring in SQL script +To start the authoring experience in a SQL script, you can create a new SQL script through one of the following methods on the Develop pane: -There are several ways to start the authoring experience in SQL script. You can create a new SQL script through one of the following methods. +- Select the plus sign (+), and then select SQL script. -1. From the Develop menu, select the "+" icon and choose SQL script. +- Right-click SQL scripts, and then select New SQL script. -2. From the Actions menu, choose New SQL script. +- Right-click SQL scripts, and then select Import. Select an existing SQL script from your local storage. -3. Choose Import from the Actions menu under Develop SQL scripts. Select an existing SQL script from your local storage. -![new sql script 3 actions](media/author-sql-script/new-sql-script-3-actions.png) + ![Screenshot that shows the Develop pane and the command to import a SQL script.](media/author-sql-script/new-sql-script-3-actions.png) ## Create your SQL script -1. Choose a name for your SQL script by selecting the Property button and replacing the default name assigned to the SQL script. -![new sql script rename](media/author-sql-script/new-sql-script-rename.png) +1. Choose a name for your SQL script by selecting the Properties button and replacing the default name assigned to the SQL script. + + ![Screenshot that shows the pane for SQL script properties.](media/author-sql-script/new-sql-script-rename.png) -2. Choose the specific dedicated SQL pool or serverless SQL pool from the Connect to drop-down menu. Or if necessary, choose the database from Use database. -![new sql choose pool](media/author-sql-script/new-sql-choose-pool.png) +2. On the Connect to dropdown menu, select the specific dedicated SQL pool or serverless SQL pool. Or if necessary, choose the database from Use database. -3. Start authoring your SQL script using the intellisense feature. + ![Screenshot that shows the dropdown menu for selecting a SQL pool.](media/author-sql-script/new-sql-choose-pool.png) + +3. Start authoring your SQL script by using the IntelliSense feature. ## Run your SQL script -Select the Run button to execute your SQL script. The results are displayed by default in a table. +To run your SQL script, select the Run button. The results appear in a table by default. -![new sql script results table](media/author-sql-script/new-sql-script-results-table.png) +![Screenshot that shows the button for running a SQL script and the table that lists results.](media/author-sql-script/new-sql-script-results-table.png) -Synapse Studio creates a new session for each SQL script execution. Once a SQL script execution completes, the session is automatically closed. +Synapse Studio creates a new session for each SQL script execution. After a SQL script execution finishes, the session is automatically closed. -Temporary tables are only visible to the session in which they were created and are automatically dropped when the session closes. +Temporary tables are visible only in the session where you created them. They're automatically dropped when the session closes. ## Export your results -You can export the results to your local storage in different formats (including CSV, Excel, JSON, XML) by selecting "Export results" and choosing the extension. +You can export the results to your local storage in various formats (including CSV, Excel, JSON, and XML) by selecting Export results and choosing the extension. -You can also visualize the SQL script results in a chart by selecting the Chart button. Select the "Chart type" and Category column. You can export the chart into a picture by selecting Save as image. +You can also visualize the SQL script results in a chart by selecting the Chart button. Then select the Chart type and Category column values. You can export the chart into a picture by selecting Save as image. -![new sql script results chart](media/author-sql-script/new-sql-script-results-chart.png) +![Screenshot that shows a results chart for a SQL script.](media/author-sql-script/new-sql-script-results-chart.png) ## Explore data from a Parquet file -You can explore Parquet files in a storage account using SQL script to preview the file contents. +You can explore Parquet files in a storage account by using a SQL script to preview the file contents. + +![Screenshot that shows selections for using a SQL script to preview contents of a Parquet file.](media/author-sql-script/new-script-sqlod-parquet.png) -![new script sqlod parquet](media/author-sql-script/new-script-sqlod-parquet.png) +## Use SQL tables, external tables, and views -## SQL Tables, external tables, views +By using a shortcut menu on the Data pane, you can select actions for resources like SQL tables, external tables, and views. Explore the available commands by right-clicking the nodes of SQL databases. The commands for New SQL script include: -By selecting the Actions menu under data, you can select several actions such as: +- Select TOP 100 rows +- CREATE +- DROP +- DROP and CREATE -- New SQL script-- Select TOP 1000 rows-- CREATE-- DROP and CREATE - -Explore the available gesture by right-clicking the nodes of SQL databases. - -![new script database](media/author-sql-script/new-script-database.png) +![Screenshot that shows shortcut menus for a table node.](media/author-sql-script/new-script-database.png) ## Create folders and move SQL scripts into a folder -From the Actions menu under Develop SQL scripts Choose "New folder" from the "Actions" menu under Develop SQL scripts. And type in the name of the new folder in the pop-up window. +To create a folder: + +1. On the Develop pane, right-click SQL scripts and then select New folder. + + > [!div class="mx-imgBorder"] + > ![Screenshot that shows an example of a SQL script and a shortcut menu with the command to create a new folder.](./media/author-sql-script/new-sql-script-create-folder.png) + +1. On the pane that opens, enter the name of the new folder. + +To move a SQL script into a folder: -> [!div class="mx-imgBorder"] -> ![Screenshot that shows an example of an SQL script with 'New folder' selected.](./media/author-sql-script/new-sql-script-create-folder.png) +1. Right-click the SQL script, and then select Move to. -To move a SQL script into a folder, you can select the sql script and choose "Move To" from the Actions menu. Then find the destination folder in the new window and move the sql script into selected folder.You can also quickly drag the sql script and drop it into a folder. +1. On the Move to pane, choose a destination folder, and then select Move here. You can also quickly drag the SQL script and drop it into a folder. -> [!div class="mx-imgBorder"] -> ![newsqlscript](./media/author-sql-script/new-sql-script-move-folder.png) +> [!div class="mx-imgBorder"] +> ![Screenshot that shows selections for moving a SQL script into a folder.](./media/author-sql-script/new-sql-script-move-folder.png) -## Next steps +## Related content -For more information about authoring a SQL script, see -[Azure Synapse Analytics](../index.yml). +- For more information about authoring a SQL script, see the +[Azure Synapse Analytics documentation](../index.yml).
traffic-manager	Configure Multivalue Routing Method Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/configure-multivalue-routing-method-template.md	Previously updated : 04/28/2022 Last updated : 08/08/2024
traffic-manager	Traffic Manager Configure Geographic Routing Method	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-configure-geographic-routing-method.md	Previously updated : 10/15/2020 Last updated : 08/08/2024
traffic-manager	Traffic Manager Configure Subnet Routing Method	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-configure-subnet-routing-method.md	Previously updated : 09/17/2018 Last updated : 08/08/2024
traffic-manager	Traffic Manager Create Rum Web Pages	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-create-rum-web-pages.md	Previously updated : 04/06/2021 Last updated : 08/08/2024
traffic-manager	Traffic Manager Diagnostic Logs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-diagnostic-logs.md	Previously updated : 05/17/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Geographic Regions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-geographic-regions.md	Previously updated : 04/27/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager How It Works	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-how-it-works.md	Previously updated : 08/14/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Manage Endpoints	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-manage-endpoints.md	Previously updated : 10/02/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Manage Profiles	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-manage-profiles.md	Previously updated : 08/14/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Metrics Alerts	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-metrics-alerts.md	Previously updated : 04/27/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Monitoring	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-monitoring.md	Previously updated : 06/21/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Nested Profiles	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-nested-profiles.md	Previously updated : 11/10/2022 Last updated : 08/08/2024
traffic-manager	Traffic Manager Performance Considerations	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-performance-considerations.md	Previously updated : 01/27/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Point Internet Domain	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-point-internet-domain.md	Previously updated : 04/27/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Powershell Arm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-powershell-arm.md	Previously updated : 03/16/2017 Last updated : 08/08/2024
traffic-manager	Traffic Manager Routing Methods	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-routing-methods.md	Previously updated : 11/30/2022 Last updated : 08/08/2024
traffic-manager	Traffic Manager Rum Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-rum-overview.md	Previously updated : 04/27/2023 Last updated : 08/08/2024 # Traffic Manager Real User Measurements overview -When you set up a Traffic Manager profile to use the performance routing method, the service looks at where the DNS query requests are coming from and makes routing decisions to direct those requestors to the Azure region that gives them the lowest latency. This is accomplished by utilizing the network latency intelligence that Traffic Manager maintains for different end-user networks. +When you set up a Traffic Manager profile to use the performance routing method, the service looks at the origin of the DNS query and makes routing decisions to direct queries to the Azure region that provides the lowest latency. This is accomplished by utilizing the network latency intelligence that Traffic Manager maintains for different end-user networks. -Real User Measurements enables you to measure network latency measurements to Azure regions, from the client applications your end users use, and have Traffic Manager consider that information as well when making routing decisions. By choosing to use the Real User Measurements, you can increase the accuracy of the routing for requests coming from those networks where your end users reside. +Real User Measurements enables you to view network latency measurements to Azure regions. You can review the client applications your end users use and have Traffic Manager consider that information when making routing decisions. Using Real User Measurements can increase the accuracy of the routing for requests coming from those networks where your end users reside. ## How Real User Measurements work -Real User Measurements work by having your client applications measure latency to Azure regions as it is seen from the end-user networks in which they are used. For example, if you have a web page that is accessed by users across different locations (for example, in the North American regions), you can use Real User Measurements with performance routing method to get them to the best Azure region in which your server application is hosted. +Real User Measurements works by having your client applications measure latency to Azure regions as it is seen from the end-user networks. For example, if you have a web page that is accessed by users across different locations (for example, in different North American regions), you can use Real User Measurements with the performance routing method to get them to the best Azure region for your server application. It starts by embedding an Azure provided JavaScript (with a unique key in it) in your web pages. Once that is done, whenever a user visits that webpage, the JavaScript queries Traffic Manager to get information about the Azure regions it should measure. The service returns a set of endpoints to the script that then measure these regions consecutively by downloading a single pixel image hosted in those Azure regions and noting the latency between the time the request was sent and the time when the first byte was received. These measurements are then reported back to the Traffic Manager service.
traffic-manager	Traffic Manager Testing Settings	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-testing-settings.md	Previously updated : 05/22/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Traffic View Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-traffic-view-overview.md	Previously updated : 03/22/2023 Last updated : 08/08/2024
traffic-manager	Traffic Manager Troubleshooting Degraded	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/traffic-manager-troubleshooting-degraded.md	Previously updated : 05/03/2017 Last updated : 08/08/2024 # Troubleshooting degraded state on Azure Traffic Manager -This article describes how to troubleshoot an Azure Traffic Manager profile that is showing a degraded status. As a first step in troubleshooting a Azure Traffic Manager degraded state is to enable logging. Refer to [Enable resource logs](./traffic-manager-diagnostic-logs.md) for more information. For this scenario, consider that you have configured a Traffic Manager profile pointing to some of your cloudapp.net hosted services. If the health of your Traffic Manager displays a Degraded status, then the status of one or more endpoints may be Degraded: +This article describes how to troubleshoot an Azure Traffic Manager profile that is showing a degraded status. As a first step in troubleshooting an Azure Traffic Manager degraded state is to enable logging. Refer to [Enable resource logs](./traffic-manager-diagnostic-logs.md) for more information. For this scenario, consider that you configure a Traffic Manager profile pointing to some of your cloudapp.net hosted services. If the health of your Traffic Manager displays a Degraded status, then the status of one or more endpoints may be Degraded: ![degraded endpoint status](./media/traffic-manager-troubleshooting-degraded/traffic-manager-degradedifonedegraded.png) If the health of your Traffic Manager displays an Inactive status, then both ## Understanding Traffic Manager probes -* Traffic Manager considers an endpoint to be ONLINE only when the probe receives an HTTP 200 response back from the probe path. If you application returns any other HTTP response code you should add that response code to [Expected status code ranges](./traffic-manager-monitoring.md#configure-endpoint-monitoring) of your Traffic Manager profile. -* A 30x redirect response is treated as failure unless you have specified this as a valid response code in [Expected status code ranges](./traffic-manager-monitoring.md#configure-endpoint-monitoring) of your Traffic Manager profile. Traffic Manager does not probe the redirection target. +* Traffic Manager considers an endpoint to be ONLINE only when the probe receives an HTTP 200 response back from the probe path. If your application returns any other HTTP response code you should add that response code to [Expected status code ranges](./traffic-manager-monitoring.md#configure-endpoint-monitoring) of your Traffic Manager profile. +* A 30x redirect response is treated as failure unless you specify this as a valid response code in [Expected status code ranges](./traffic-manager-monitoring.md#configure-endpoint-monitoring) of your Traffic Manager profile. Traffic Manager doesn't probe the redirection target. * For HTTPs probes, certificate errors are ignored. * The actual content of the probe path doesn't matter, as long as a 200 is returned. Probing a URL to some static content like "/favicon.ico" is a common technique. Dynamic content, like the ASP pages, may not always return 200, even when the application is healthy. -* A best practice is to set the probe path to something that has enough logic to determine that the site is up or down. In the previous example, by setting the path to "/favicon.ico", you are only testing that w3wp.exe is responding. This probe may not indicate that your web application is healthy. A better option would be to set a path to a something such as "/Probe.aspx" that has logic to determine the health of the site. For example, you could use performance counters to CPU utilization or measure the number of failed requests. Or you could attempt to access database resources or session state to make sure that the web application is working. -* If all endpoints in a profile are degraded, then Traffic Manager treats all endpoints as healthy and routes traffic to all endpoints. This behavior ensures that problems with the probing mechanism do not result in a complete outage of your service. +* A best practice is to set the probe path to something that has enough logic to determine that the site is up or down. In the previous example, by setting the path to "/favicon.ico", you're only testing that w3wp.exe is responding. This probe may not indicate that your web application is healthy. A better option would be to set a path to a something such as "/Probe.aspx" that has logic to determine the health of the site. For example, you could use performance counters to CPU utilization or measure the number of failed requests. Or you could attempt to access database resources or session state to make sure that the web application is working. +* If all endpoints in a profile are degraded, then Traffic Manager treats all endpoints as healthy and routes traffic to all endpoints. This behavior ensures that problems with the probing mechanism don't result in a complete outage of your service. ## Troubleshooting
traffic-manager	Tutorial Traffic Manager Improve Website Response	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/tutorial-traffic-manager-improve-website-response.md	Previously updated : 03/06/2023 Last updated : 08/08/2024 # Customer intent: As an IT Admin, I want to route traffic so I can improve website response by choosing the endpoint with lowest latency.
traffic-manager	Tutorial Traffic Manager Subnet Routing	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/tutorial-traffic-manager-subnet-routing.md	Previously updated : 03/08/2021 Last updated : 08/08/2024
traffic-manager	Tutorial Traffic Manager Weighted Endpoint Routing	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/traffic-manager/tutorial-traffic-manager-weighted-endpoint-routing.md	Previously updated : 10/19/2020 Last updated : 08/08/2024 # Customer intent: As an IT Admin, I want to distribute traffic based on the weight assigned to a website endpoint so that I can control the user traffic to a given website. In this section, you create two VMs (myIISVMEastUS and myIISVMWestEurope) in - Administrator Account > Username: Enter a user name of your choosing. - Administrator Account > Password: Enter a password of your choosing. The password must be at least 12 characters long and meet the [defined complexity requirements](../virtual-machines/windows/faq.yml?toc=%2fazure%2fvirtual-network%2ftoc.json#what-are-the-password-requirements-when-creating-a-vm-). - Inbound Port Rules > Public inbound ports: Select Allow selected ports. - - Inbound Port Rules > Select inbound ports: Select RDP and HTTP in the pull down box. + - Inbound Port Rules > Select inbound ports: Select RDP and HTTP from the drop-down list. 3. Select the Management tab, or select Next: Disks, then Next: Networking, then Next: Management. Under Monitoring, set Boot diagnostics to Off. 4. Select Review + create. In this section, you install the IIS server on the two VMs myIISVMEastUS and myI 1. Select All resources on the left menu. From the resource list, select myIISVMEastUS in the myResourceGroupTM1 resource group. 2. On the Overview page, select Connect. In Connect to virtual machine, select Download RDP file. -3. Open the downloaded .rdp file. If you're prompted, select Connect. Enter the user name and password that you specified when you created the VM. You might need to select More choices > Use a different account, to specify the credentials that you entered when you created the VM. +3. Open the `.rdp` file you downloaded. If you're prompted, select Connect. Enter the user name and password that you specified when you created the VM. You might need to select More choices > Use a different account, to specify the credentials that you entered when you created the VM. 4. Select OK. 5. You might receive a certificate warning during the sign-in process. If you receive the warning, select Yes or Continue to proceed with the connection. 6. On the server desktop, browse to Windows Administrative Tools > Server Manager. In this section, you create a VM (myVMEastUS and myVMWestEurope) in each Azu - Administrator Account > Username: Enter a user name of your choosing. - Administrator Account > Password: Enter a password of your choosing. The password must be at least 12 characters long and meet the [defined complexity requirements](../virtual-machines/windows/faq.yml?toc=%2fazure%2fvirtual-network%2ftoc.json#what-are-the-password-requirements-when-creating-a-vm-). - Inbound Port Rules > Public inbound ports: Select Allow selected ports. - - Inbound Port Rules > Select inbound ports: Select RDP in the pull down box. + - Inbound Port Rules > Select inbound ports: Select RDP from the drop-down box. 3. Select the Management tab, or select Next: Disks, then Next: Networking, then Next: Management. Under Monitoring, set Boot diagnostics to Off. 4. Select Review + create. In this section, you can see Traffic Manager in action. 1. Select All resources on the left menu. From the resource list, select myVMEastUS in the myResourceGroupTM1 resource group. 2. On the Overview page, select Connect. In Connect to virtual machine, select Download RDP file. -3. Open the downloaded .rdp file. If you're prompted, select Connect. Enter the user name and password that you specified when creating the VM. You might need to select More choices > Use a different account, to specify the credentials that you entered when you created the VM. +3. Open the `.rdp` file you downloaded. If you're prompted, select Connect. Enter the user name and password that you specified when creating the VM. You might need to select More choices > Use a different account, to specify the credentials that you entered when you created the VM. 4. Select OK. 5. You might receive a certificate warning during the sign-in process. If you receive the warning, select Yes or Continue to proceed with the connection. 6. In a web browser on the VM myVMEastUS, enter the DNS name of your Traffic Manager profile to view your website. You're routed to website hosted on the IIS server myIISVMEastUS because it's assigned a higher weight of 100. The IIS server myIISVMWestEurope is assigned a lower endpoint weight value of 25.
virtual-desktop	Add Session Hosts Host Pool	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-desktop/add-session-hosts-host-pool.md	description: Learn how to add session hosts virtual machines to a host pool in A Previously updated : 04/11/2024 Last updated : 08/08/2024 # Add session hosts to a host pool -> [!IMPORTANT] -> Azure Virtual Desktop for Azure Stack HCI is currently in preview for Azure Government and Azure China. See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. - +> The following features are currently in preview: +> - Azure Virtual Desktop on Azure Stack HCI for Azure Government and Azure China. +> - Azure Virtual Desktop on Azure Extended Zones. +> +> See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. Once you've created a host pool, workspace, and an application group, you need to add session hosts to the host pool for your users to connect to. You may also need to add more session hosts for extra capacity. You can create new virtual machines (VMs) to use as session hosts and add them to a host pool natively using the Azure Virtual Desktop service in the Azure portal. Alternatively you can also create VMs outside of the Azure Virtual Desktop service, such as with an automated pipeline, then add them as session hosts to a host pool. When using Azure CLI or Azure PowerShell you'll need to create the VMs outside of Azure Virtual Desktop, then add them as session hosts to a host pool separately. Review the [Prerequisites for Azure Virtual Desktop](prerequisites.md) for a gen \| Action \| RBAC role(s) \| \|--\|--\| \| Generate a host pool registration key \| [Desktop Virtualization Host Pool Contributor](rbac.md#desktop-virtualization-host-pool-contributor) \| - \| Create and add session hosts using the Azure portal (Azure) \| [Desktop Virtualization Host Pool Contributor](rbac.md#desktop-virtualization-host-pool-contributor)<br />[Virtual Machine Contributor](../role-based-access-control/built-in-roles.md#virtual-machine-contributor) \| + \| Create and add session hosts using the Azure portal (Azure and Azure Extended Zones) \| [Desktop Virtualization Host Pool Contributor](rbac.md#desktop-virtualization-host-pool-contributor)<br />[Virtual Machine Contributor](../role-based-access-control/built-in-roles.md#virtual-machine-contributor) \| \| Create and add session hosts using the Azure portal (Azure Stack HCI) \| [Desktop Virtualization Host Pool Contributor](rbac.md#desktop-virtualization-host-pool-contributor)<br />[Azure Stack HCI VM Contributor](/azure-stack/hci/manage/assign-vm-rbac-roles) \| Alternatively you can assign the [Contributor](../role-based-access-control/built-in-roles.md#contributor) RBAC role. Review the [Prerequisites for Azure Virtual Desktop](prerequisites.md) for a gen - A logical network that you created on your Azure Stack HCI cluster. DHCP logical networks or static logical networks with automatic IP allocation are supported. For more information, see [Create logical networks for Azure Stack HCI](/azure-stack/hci/manage/create-logical-networks). +To deploy session hosts to [Azure Extended Zones](/azure/virtual-desktop/azure-extended-zones), you also need: + + - Your Azure subscription registered with the respective Azure Extended Zone. For more information, see [Request access to an Azure Extended Zone](../extended-zones/request-access.md). + + - An existing [Azure Load Balancer](../load-balancer/load-balancer-outbound-connections.md) on the virtual network that the session hosts are being deployed to. + - If you want to use Azure CLI or Azure PowerShell locally, see [Use Azure CLI and Azure PowerShell with Azure Virtual Desktop](cli-powershell.md) to make sure you have the [desktopvirtualization](/cli/azure/desktopvirtualization) Azure CLI extension or the [Az.DesktopVirtualization](/powershell/module/az.desktopvirtualization) PowerShell module installed. Alternatively, use the [Azure Cloud Shell](../cloud-shell/overview.md). > [!IMPORTANT] Here's how to create session hosts and register them to a host pool using the Az \| Confirm password \| Reenter the password. \| </details> + <details> + <summary>To add session hosts on <b>Azure Extended Zones</b>, select to expand this section.</summary> + + \| Parameter \| Value/Description \| + \|--\|--\| + \| Resource group \| This automatically defaults to the resource group you chose your host pool to be in on the Basics tab, but you can also select an alternative. \| + \| Name prefix \| Enter a name for your session hosts, for example hp01-sh.<br /><br />This value is used as the prefix for your session hosts. Each session host has a suffix of a hyphen and then a sequential number added to the end, for example hp01-sh-0.<br /><br />This name prefix can be a maximum of 11 characters and is used in the computer name in the operating system. The prefix and the suffix combined can be a maximum of 15 characters. Session host names must be unique. \| + \| Virtual machine type \| Select Azure virtual machine. \| + \| Virtual machine location \| Select the Azure region where you want to deploy your session hosts. This must be the same region that your virtual network is in. Then select Deploy to an Azure Extended Zone. \| + \| Azure Extended Zones \| \| + \| Azure Extended Zone \| Select Los Angeles. \| + \| Place the session host(s) behind an existing load balancing solution? \| Check the box. This will show options for selecting a load balancer and a backend pool.\| + \| Select a load balancer \| Select an existing load balancer on the virtual network that the session hosts are being deployed to. \| + \| Select a backend pool \| Select a backend pool on the load balancer to that you want to place the sessions host(s) into. \| + \| Availability options \| Select from [availability zones](../reliability/availability-zones-overview.md), [availability set](../virtual-machines/availability-set-overview.md), or No infrastructure dependency required. If you select availability zones or availability set, complete the extra parameters that appear. \| + \| Security type \| Select from Standard, [Trusted launch virtual machines](../virtual-machines/trusted-launch.md), or [Confidential virtual machines](../confidential-computing/confidential-vm-overview.md).<br /><br />- If you select Trusted launch virtual machines, options for secure boot and vTPM are automatically selected.<br /><br />- If you select Confidential virtual machines, options for secure boot, vTPM, and integrity monitoring are automatically selected. You can't opt out of vTPM when using a confidential VM. \| + </details> + Once you've completed this tab, select Next: Tags. 1. On the Tags tab, you can optionally enter any name/value pairs you need, then select Next: Review + create.
virtual-desktop	Azure Extended Zones	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-desktop/azure-extended-zones.md	+ + Title: Azure Extended Zones for Azure Virtual Desktop +description: Learn about using Azure Virtual Desktop on Azure Extended Zones. +++ Last updated : 08/08/2024++ +# Azure Virtual Desktop on Azure Extended Zones + +> [!IMPORTANT] +> Using Azure Virtual Desktop on Azure Extended Zones is currently in preview. See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. + +[Azure Extended Zones](/azure/extended-zones/overview) are small-footprint extensions of Azure placed in metros, industry centers, or a specific jurisdiction to serve low latency and/or data residency workloads. Azure Extended Zones is supported for Azure Virtual Desktop and can run latency-sensitive and throughput-intensive applications close to end users and within approved data residency boundaries. Azure Extended Zones are part of the Microsoft global network that provides secure, reliable, high-bandwidth connectivity between applications that run at an Azure Extended Zone close to the user. + +## How Azure Extended Zones works + +When you deploy Azure Virtual Desktop with an Azure Extended Zone, only the session host virtual machines are deployed in the Azure Extended Zone. All of the Azure Virtual Desktop metadata objects you create, such as host post pools, workspaces, and application groups remain in the main Azure region you select. The control plane components, such as the web service, broker service, gateway service, diagnostics, and extensibility components, are also only available in the main Azure regions. For more information, see [Azure Virtual Desktop service architecture and resilience](service-architecture-resilience.md). + +Due to the proximity of the end user to the session host, you can benefit from reduced latency using Azure Extended Zones. Azure Extended Zones uses [RDP Shortpath](rdp-shortpath.md), which establishes a direct UDP-based transport between a supported Windows Remote Desktop client and session host. The removal of extra relay points reduces round-trip time, which improves connection reliability and user experience with latency-sensitive applications and input methods. + +[Azure Private Link](private-link-overview.md) can also be used with Azure Extended Zones. Azure Private Link can help with reducing latency and improving security. By creating a [private endpoint](../private-link/private-endpoint-overview.md), traffic between your virtual network and the service remains on the Microsoft network, so you no longer need to expose your service to the public internet. + +Unlike Azure regions, Azure Extended Zones doesn't have any default outbound connectivity. An existing Azure Load Balancer is needed on the virtual network that the session hosts are being deployed to. You need to use one or more frontend IP addresses of the load balancer for outbound connectivity to the internet in order for the session hosts to join a host pool. For more information, see [Azure's outbound connectivity methods](../load-balancer/load-balancer-outbound-connections.md#scenarios). + +## Gaining access to an Azure Extended Zone + +To deploy Azure Virtual Desktop in Azure Extended Zone locations, you need to explicitly register your subscription with the respective Azure Extended Zone using an account that is a subscription owner. By default, this capability isn't enabled. Registration of an Azure Extended Zone is always scoped to a specific subscription, ensuring control and management over the resources deployed in these locations. Once a subscription is registered with the Azure Extended Zone, you can deploy and manage your desktops and applications within that specific Azure Extended Zone. + +For more information, see [Request access to an Azure Extended Zone](/azure/extended-zones/request-access). + +## Limitations + +Azure Virtual Desktop on Azure Extended Zones has the following limitations: + +- With Azure Extended Zones, there's no default outbound internet access. The default outbound route is being retired across all Azure regions in September 2025, so Azure Extended Zones begins without this default outbound internet route. For more information, see [Default outbound access for VMs in Azure will be retiredΓÇö transition to a new method of internet access.](https://azure.microsoft.com/updates/default-outbound-access-for-vms-in-azure-will-be-retired-transition-to-a-new-method-of-internet-access/) + +- Azure Extended Zones don't support NAT Gateways. You need to use an Azure Load Balancer with outbound rules enabled for outbound connectivity. + +- There's a reduced set of supported virtual machine SKUs you can use as session hosts. For more information, see [Service offerings for Azure Extended Zones](/azure/extended-zones/overview#service-offerings-for-azure-extended-zones). + +## Next step + +To learn how to deploy Azure Virtual Desktop in an Azure Extended Zone, see [Deploy Azure Virtual Desktop](deploy-azure-virtual-desktop.md).
virtual-desktop	Azure Stack Hci Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-desktop/azure-stack-hci-overview.md	Title: Azure Virtual Desktop for Azure Stack HCI -description: Learn about using Azure Virtual Desktop with Azure Stack HCI, enablng you to deploy session hosts where you need them. + Title: Azure Virtual Desktop on Azure Stack HCI +description: Learn about using Azure Virtual Desktop on Azure Stack HCI, enablng you to deploy session hosts where you need them. Last updated 04/11/2024 -# Azure Virtual Desktop for Azure Stack HCI +# Azure Virtual Desktop on Azure Stack HCI > [!IMPORTANT] -> Azure Virtual Desktop for Azure Stack HCI is currently in preview for Azure Government and Azure China. See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. +> Azure Virtual Desktop on Azure Stack HCI is currently in preview for Azure Government and Azure China. See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. -Using Azure Virtual Desktop for Azure Stack HCI, you can deploy session hosts for Azure Virtual Desktop where you need them. If you already have an existing on-premises virtual desktop infrastructure (VDI) deployment, Azure Virtual Desktop with Azure Stack HCI can improve your experience. If you're already using Azure Virtual Desktop with your session hosts in Azure, you can extend your deployment to your on-premises infrastructure to better meet your performance or data locality needs. +Using Azure Virtual Desktop on Azure Stack HCI, you can deploy session hosts for Azure Virtual Desktop where you need them. If you already have an existing on-premises virtual desktop infrastructure (VDI) deployment, Azure Virtual Desktop on Azure Stack HCI can improve your experience. If you're already using Azure Virtual Desktop with your session hosts in Azure, you can extend your deployment to your on-premises infrastructure to better meet your performance or data locality needs. -Azure Virtual Desktop service components, such as host pools, workspaces, and application groups are all deployed in Azure, but you can choose to deploy session hosts on Azure Stack HCI. As Azure Virtual Desktop with Azure Stack HCI isn't an Azure Arc-enabled service, it's not supported as a standalone service outside of Azure, in a multicloud environment, or on other Azure Arc-enabled servers. +Azure Virtual Desktop service components, such as host pools, workspaces, and application groups are all deployed in Azure, but you can choose to deploy session hosts on Azure Stack HCI. As Azure Virtual Desktop on Azure Stack HCI isn't an Azure Arc-enabled service, it's not supported as a standalone service outside of Azure, in a multicloud environment, or on other Azure Arc-enabled servers. ## Benefits -Using Azure Virtual Desktop for Azure Stack HCI, you can: +Using Azure Virtual Desktop on Azure Stack HCI, you can: - Improve performance for Azure Virtual Desktop users in areas with poor connectivity to the Azure public cloud by giving them session hosts closer to their location. Using Azure Virtual Desktop for Azure Stack HCI, you can: Your Azure Stack HCI clusters need to be running a minimum of [version 23H2](/azure-stack/hci/release-information) and [registered with Azure](/azure-stack/hci/deploy/register-with-azure). -Once you're cluster is ready, you can use the following 64-bit operating system images for your session hosts that are in support: +Once your cluster is ready, you can use the following 64-bit operating system images for your session hosts that are in support: - Windows 11 Enterprise multi-session - Windows 11 Enterprise Finally, users can connect using the same [Remote Desktop clients](users/remote- ## Licensing and pricing -To run Azure Virtual Desktop for Azure Stack HCI, you need to make sure you're licensed correctly and be aware of the pricing model. There are three components that affect how much it costs to run Azure Virtual Desktop with Azure Stack HCI: +To run Azure Virtual Desktop on Azure Stack HCI, you need to make sure you're licensed correctly and be aware of the pricing model. There are three components that affect how much it costs to run Azure Virtual Desktop on Azure Stack HCI: -- User access rights. The same licenses that grant access to Azure Virtual Desktop on Azure also apply to Azure Virtual Desktop for Azure Stack HCI. Learn more at [Azure Virtual Desktop pricing](https://azure.microsoft.com/pricing/details/virtual-desktop/). +- User access rights. The same licenses that grant access to Azure Virtual Desktop on Azure also apply to Azure Virtual Desktop on Azure Stack HCI. Learn more at [Azure Virtual Desktop pricing](https://azure.microsoft.com/pricing/details/virtual-desktop/). - Azure Stack HCI service fee. Learn more at [Azure Stack HCI pricing](https://azure.microsoft.com/pricing/details/azure-stack/hci/). There are different classifications of data for Azure Virtual Desktop, such as c ## Limitations -Azure Virtual Desktop with Azure Stack HCI has the following limitations: +Azure Virtual Desktop on Azure Stack HCI has the following limitations: - Each host pool must only contain session hosts on Azure or on Azure Stack HCI. You can't mix session hosts on Azure and on Azure Stack HCI in the same host pool. Azure Virtual Desktop with Azure Stack HCI has the following limitations: - You can only join session hosts on Azure Stack HCI to an Active Directory Domain Services domain. -## Next steps +## Next step -To learn how to deploy Azure Virtual Desktop with Azure Stack HCI, see [Deploy Azure Virtual Desktop](deploy-azure-virtual-desktop.md). +To learn how to deploy Azure Virtual Desktop on Azure Stack HCI, see [Deploy Azure Virtual Desktop](deploy-azure-virtual-desktop.md).
virtual-desktop	Deploy Azure Virtual Desktop	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-desktop/deploy-azure-virtual-desktop.md	Previously updated : 05/16/2024 Last updated : 08/08/2024 # Deploy Azure Virtual Desktop > [!IMPORTANT] -> Azure Virtual Desktop for Azure Stack HCI is currently in preview for Azure Government and Azure China. See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. +> The following features are currently in preview: +> - Azure Virtual Desktop on Azure Stack HCI for Azure Government and Azure China. +> - Azure Extended Zones on Azure Virtual Desktop. +> +> See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability. This article shows you how to deploy Azure Virtual Desktop on Azure or Azure Stack HCI by using the Azure portal, Azure CLI, or Azure PowerShell. To deploy Azure Virtual Desktop you: - Create a host pool. In addition, you need: \| Resource type \| RBAC role \| \|--\|--\| \| Host pool, workspace, and application group \| [Desktop Virtualization Contributor](rbac.md#desktop-virtualization-contributor) \| - \| Session hosts (Azure) \| [Virtual Machine Contributor](../role-based-access-control/built-in-roles.md#virtual-machine-contributor) \| + \| Session hosts (Azure and Azure Extended Zones) \| [Virtual Machine Contributor](../role-based-access-control/built-in-roles.md#virtual-machine-contributor) \| \| Session hosts (Azure Stack HCI) \| [Azure Stack HCI VM Contributor](/azure-stack/hci/manage/assign-vm-rbac-roles) \| Alternatively you can assign the [Contributor](../role-based-access-control/built-in-roles.md#contributor) RBAC role to create all of these resource types. In addition, you need: - A logical network that you created on your Azure Stack HCI cluster. DHCP logical networks or static logical networks with automatic IP allocation are supported. For more information, see [Create logical networks for Azure Stack HCI](/azure-stack/hci/manage/create-logical-networks). +To deploy session hosts to [Azure Extended Zones](/azure/virtual-desktop/azure-extended-zones), you also need: + + - Your Azure subscription registered with the respective Azure Extended Zone. For more information, see [Request access to an Azure Extended Zone](../extended-zones/request-access.md). + + - An existing [Azure Load Balancer](../load-balancer/load-balancer-outbound-connections.md) on the virtual network that the session hosts are being deployed to. + # [Azure PowerShell](#tab/powershell) In addition, you need: Here's how to create a host pool using the Azure portal. \| Confirm password \| Reenter the password. \| </details> + <details> + <summary>To add session hosts on <b>Azure Extended Zones</b>, select to expand this section.</summary> + + \| Parameter \| Value/Description \| + \|--\|--\| + \| Add virtual machines \| Select Yes. This shows several new options. \| + \| Resource group \| This automatically defaults to the resource group you chose your host pool to be in on the Basics tab, but you can also select an alternative. \| + \| Name prefix \| Enter a name for your session hosts, for example hp01-sh.<br /><br />This value is used as the prefix for your session hosts. Each session host has a suffix of a hyphen and then a sequential number added to the end, for example hp01-sh-0.<br /><br />This name prefix can be a maximum of 11 characters and is used in the computer name in the operating system. The prefix and the suffix combined can be a maximum of 15 characters. Session host names must be unique. \| + \| Virtual machine type \| Select Azure virtual machine. \| + \| Virtual machine location \| Select the Azure region where you want to deploy your session hosts. This must be the same region that your virtual network is in. Then select Deploy to an Azure Extended Zone. \| + \| Azure Extended Zones \| \| + \| Azure Extended Zone \| Select Los Angeles. \| + \| Place the session host(s) behind an existing load balancing solution? \| Check the box. This will show options for selecting a load balancer and a backend pool.\| + \| Select a load balancer \| Select an existing load balancer on the virtual network that the session hosts are being deployed to. \| + \| Select a backend pool \| Select a backend pool on the load balancer to that you want to place the sessions host(s) into. \| + \| Availability options \| Select from [availability zones](../reliability/availability-zones-overview.md), [availability set](../virtual-machines/availability-set-overview.md), or No infrastructure dependency required. If you select availability zones or availability set, complete the extra parameters that appear. \| + \| Security type \| Select from Standard, [Trusted launch virtual machines](../virtual-machines/trusted-launch.md), or [Confidential virtual machines](../confidential-computing/confidential-vm-overview.md).<br /><br />- If you select Trusted launch virtual machines, options for secure boot and vTPM are automatically selected.<br /><br />- If you select Confidential virtual machines, options for secure boot, vTPM, and integrity monitoring are automatically selected. You can't opt out of vTPM when using a confidential VM. \| + </details> ++ Once you've completed this tab, select Next: Workspace. 1. Optional: On the Workspace tab, if you want to create a workspace and register the default desktop application group from this host pool, complete the following information:
virtual-machine-scale-sets	Virtual Machine Scale Sets Automatic Upgrade	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-upgrade.md	description: Learn how to automatically upgrade the OS image on VM instances in -+ Last updated 07/08/2024 Action groups can be created and managed using: * [ARM Resource Manager](/azure/azure-monitor/alerts/action-groups#create-an-action-group-with-a-resource-manager-template) * [Portal](/azure/azure-monitor/alerts/action-groups#create-an-action-group-in-the-azure-portal) * PowerShell: - * [New-AzActionGroup](/powershell/module/az.monitor/new-azactiongroup?view=azps-12.0.0) - * [Get-AzActionGroup](/powershell/module/az.monitor/get-azactiongroup?view=azps-12.0.0) - * [Remove-AzActionGroup](/powershell/module/az.monitor/remove-azactiongroup?view=azps-12.0.0) -* [CLI](/cli/azure/monitor/action-group?view=azure-cli-latest#az-monitor-action-group-create) + * [New-AzActionGroup](/powershell/module/az.monitor/new-azactiongroup) + * [Get-AzActionGroup](/powershell/module/az.monitor/get-azactiongroup) + * [Remove-AzActionGroup](/powershell/module/az.monitor/remove-azactiongroup) +* [CLI](/cli/azure/monitor/action-group#az-monitor-action-group-create) Customers can set up the following using action groups: * [SMS and/or Email notifications](/azure/azure-monitor/alerts/action-groups#email-azure-resource-manager) -* [Webhooks](/azure/azure-monitor/alerts/action-groups#webhook) - Customers can attach webhooks to their automation runbooks and configure their action groups to trigger the runbooks. You can start a runbook from a [webhook](https://docs.microsoft.com/azure/automation/automation-webhooks) +* [Webhooks](/azure/azure-monitor/alerts/action-groups#webhook) - Customers can attach webhooks to their automation runbooks and configure their action groups to trigger the runbooks. You can start a runbook from a [webhook](/azure/automation/automation-webhooks) * [ITSM Connections](/azure/azure-monitor/alerts/itsmc-overview) ## Investigate and Resolve Auto Upgrade Errors
virtual-machine-scale-sets	Virtual Machine Scale Sets Maxsurge	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machine-scale-sets/virtual-machine-scale-sets-maxsurge.md	Before configuring a rolling upgrade policy on a Virtual Machine Scale Set with ```azurepowershell-interactive Register-AzProviderFeature -FeatureName VMSSFlexRollingUpgrade -ProviderNameSpace Microsoft.Compute +Register-AzProviderFeature -FeatureName ImageReferenceUpgradeForVmoVMs -ProviderNamespace Microsoft.Compute + Register-AzProviderFeature -FeatureName MaxSurgeRollingUpgrade -ProviderNamespace Microsoft.Compute ```
virtual-machines	Attach Os Disk	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/attach-os-disk.md	Title: Attach an existing OS disk to a VM description: Create a new Windows VM by attaching a specialized OS disk. -+ Last updated 03/30/2023
virtual-machines	Auto Shutdown Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/auto-shutdown-vm.md	Title: Auto-shutdown a VM description: Learn how to set up auto-shutdown for VMs in Azure. -+ Last updated 09/27/2023
virtual-machines	Availability Set Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/availability-set-overview.md	Title: Availability sets overview description: Learn about availability sets for virtual machines in Azure. -+ Last updated 09/26/2022
virtual-machines	Availability	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/availability.md	Title: Availability options for Azure Virtual Machines description: Learn about the availability options for running virtual machines in Azure -+ Last updated 10/18/2022
virtual-machines	Azure Cli Change Subscription Marketplace	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/azure-cli-change-subscription-marketplace.md	Last updated 04/20/2023 -+ ms.devlang: azurecli
virtual-machines	Boot Diagnostics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/boot-diagnostics.md	Title: Azure boot diagnostics description: Overview of Azure boot diagnostics and managed boot diagnostics -+
virtual-machines	Capacity Reservation Associate Virtual Machine Scale Set Flex	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-associate-virtual-machine-scale-set-flex.md	Title: Associate a virtual machine scale set with flexible orchestration to a Ca description: Learn how to associate a new virtual machine scale set with flexible orchestration mode to a Capacity Reservation group. -+ Last updated 11/22/2022
virtual-machines	Capacity Reservation Associate Virtual Machine Scale Set	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-associate-virtual-machine-scale-set.md	Title: Associate a Virtual Machine Scale Set with uniform orchestration to a Cap description: Learn how to associate a new or existing virtual machine scale with uniform orchestration set to a Capacity Reservation group. -+ Last updated 11/22/2022
virtual-machines	Capacity Reservation Associate Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-associate-vm.md	Title: Associate a virtual machine to a Capacity Reservation group description: Learn how to associate a new or existing virtual machine to a Capacity Reservation group. -+ Last updated 11/22/2022
virtual-machines	Capacity Reservation Create	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-create.md	Title: Create a Capacity Reservation in Azure description: Learn how to reserve Compute capacity in an Azure region or an Availability Zone by creating a Capacity Reservation. -+ Last updated 04/24/2023
virtual-machines	Capacity Reservation Modify	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-modify.md	Title: Modify a Capacity Reservation in Azure description: Learn how to modify a Capacity Reservation. -+ Last updated 11/22/2022
virtual-machines	Capacity Reservation Overallocate	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-overallocate.md	Title: Overallocating Capacity Reservation in Azure description: Learn how overallocation works when it comes to Capacity Reservation. -+ Last updated 04/24/2023
virtual-machines	Capacity Reservation Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-overview.md	Title: On-demand Capacity Reservation in Azure description: Learn how to reserve compute capacity in an Azure region or an Availability Zone with Capacity Reservation. -+ Last updated 02/24/2023
virtual-machines	Capacity Reservation Remove Virtual Machine Scale Set	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-remove-virtual-machine-scale-set.md	Title: Remove a Virtual Machine Scale Set association from a Capacity Reservatio description: Learn how to remove a Virtual Machine Scale Set from a Capacity Reservation group. -+ Last updated 11/22/2022
virtual-machines	Capacity Reservation Remove Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/capacity-reservation-remove-vm.md	Title: Remove a virtual machine association from a Capacity Reservation group description: Learn how to remove a virtual machine from a Capacity Reservation group. -+ Last updated 04/24/2023
virtual-machines	Compute Throttling Limits	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/compute-throttling-limits.md	Title: Compute throttling limits description: Compute throttling limits -+ Last updated 05/27/2024
virtual-machines	Concepts Restore Points	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/concepts-restore-points.md	Title: Support matrix for VM restore points description: Support matrix for VM restore points-+ Last updated 07/05/2022 The following table summarizes the support matrix for VM restore points. VMs with ADE (Azure Disk Encryption) \| Yes VMs using Accelerated Networking \| Yes Minimum Frequency at which App consistent restore point can be taken \| 3 hours -Minimum Frequency at which [crash consistent restore points (preview)](https://github.com/Azure/Virtual-Machine-Restore-Points/tree/main/Crash%20consistent%20VM%20restore%20points%20(preview)) can be taken \| 1 hour +Minimum Frequency at which crash consistent restore points can be taken \| 1 hour API version for Application consistent restore point \| 2021-03-01 or later -API version for Crash consistent restore point (in preview) \| 2021-07-01 or later +API version for Crash consistent restore point \| 2021-07-01 or later > [!Note] > Restore Points (App consistent or crash consistent) can be created by customer at the minimum supported frequency as mentioned above. Taking restore points at a frequency lower than supported would result in failure. The following Windows operating systems are supported when creating restore poin - Windows Server 2016 (Datacenter/Datacenter Core/Standard) - Windows Server 2012 R2 (Datacenter/Standard) - Windows Server 2012 (Datacenter/Standard)-- Windows Server 2008 R2 (RTM and SP1 Standard)-- Windows Server 2008 (64 bit only) +- OS that have reached [extended security update](/lifecycle/faq/extended-security-updates) will not be supported. Check your product's lifecycle [here](/lifecycle/products/) Restore points don't support 32-bit operating systems. For Azure VM Linux VMs, restore points support the list of Linux [distributions - Restore points are supported only for managed disks. - Ephemeral OS disks, and Shared disks aren't supported via both consistency modes. - Restore points APIs require an API of version 2021-03-01 or later for application consistency. -- Restore points APIs require an API of version 2021-03-01 or later for crash consistency. (in preview) +- Restore points APIs require an API of version 2021-03-01 or later for crash consistency. - A maximum of 10,000 restore point collections can be retained at per subscription per region level. - A maximum of 500 VM restore points can be retained at any time for a VM, irrespective of the number of restore point collections. - Concurrent creation of restore points for a VM isn't supported.
virtual-machines	Copy Files To Vm Using Scp	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/copy-files-to-vm-using-scp.md	Title: Use SCP to move files to and from a VM description: Securely move files to and from a Linux VM in Azure using SCP and an SSH key pair. -+ Last updated 12/9/2022
virtual-machines	Cost Optimization Best Practices	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/cost-optimization-best-practices.md	description: Learn the best practices for managing costs for virtual machines. -+ Last updated 02/21/2024
virtual-machines	Cost Optimization Monitor Costs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/cost-optimization-monitor-costs.md	description: Learn how to monitor costs for virtual machines by using cost analy -+ Last updated 02/21/2024
virtual-machines	Cost Optimization Plan To Manage Costs	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/cost-optimization-plan-to-manage-costs.md	description: Learn how to plan for and manage costs for virtual machines by usin -+ Last updated 02/21/2024
virtual-machines	Create Portal Availability Zone	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/create-portal-availability-zone.md	Title: Create zonal VMs with the Azure portal description: Create VMs in an availability zone with the Azure portal -+ Last updated 06/06/2024
virtual-machines	Custom Data	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/custom-data.md	description: This article gives details on using custom data and cloud-init on Azure virtual machines. - + Last updated 02/24/2023
virtual-machines	Delete	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/delete.md	Title: Delete a VM and attached resources description: Learn how to delete a VM and the resources attached to the VM. -+ Last updated 05/09/2022
virtual-machines	Estimated Vm Create Cost Card	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/estimated-vm-create-cost-card.md	Title: Estimate the cost of creating a Virtual Machine in the Azure portal (Preview) description: Use the Azure portal Virtual Machine creation flow cost card to estimate the final cost of your Virtual Machine. -+ Last updated 04/03/2024
virtual-machines	Network Watcher Update	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/extensions/network-watcher-update.md	Title: Update Network Watcher extension to the latest version description: Learn how to update the Azure Network Watcher Agent virtual machine (VM) extension to the latest version. -+ Last updated 07/05/2024
virtual-machines	Flash Azure Monitor	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/flash-azure-monitor.md	Title: Project Flash - Use Azure Monitor to monitor Azure Virtual Machine availability description: This article covers important concepts for monitoring Azure virtual machine availability using the Azure Monitor VM availability metric.-+
virtual-machines	Flash Azure Resource Graph	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/flash-azure-resource-graph.md	Title: Project Flash - Use Azure Resource Graph to monitor Azure Virtual Machine availability description: This article covers important concepts for monitoring Azure virtual machine availability using Azure Resource Graph.-+
virtual-machines	Flash Azure Resource Health	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/flash-azure-resource-health.md	Title: Project Flash - Use Azure Resource Health to monitor Azure Virtual Machine availability description: This article covers important concepts for monitoring Azure virtual machine availability using Azure Resource Health.-+
virtual-machines	Flash Event Grid System Topic	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/flash-event-grid-system-topic.md	Title: Project Flash - Use Azure Event Grid to monitor Azure Virtual Machine availability description: This article covers important concepts for monitoring Azure virtual machine availability using Azure Event Grid system topics.-+
virtual-machines	Flash Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/flash-overview.md	Title: Project Flash - Advancing Azure Virtual Machine availability monitoring description: This article covers important concepts for monitoring Azure virtual machine availability using the features of Project Flash.-+
virtual-machines	Hibernate Resume Troubleshooting	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/hibernate-resume-troubleshooting.md	Title: Troubleshoot hibernation in Azure description: Learn how to troubleshoot VM hibernation. -+ Last updated 05/16/2024
virtual-machines	Hibernate Resume	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/hibernate-resume.md	Title: Hibernation overview description: Overview of hibernating your VM. -+ Last updated 05/14/2024
virtual-machines	Infrastructure Automation	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/infrastructure-automation.md	Title: Use infrastructure automation tools description: Learn how to use infrastructure automation tools such as Ansible, Chef, Puppet, Terraform, and Packer to create and manage virtual machines in Azure. -+ Last updated 09/21/2023
virtual-machines	Instance Metadata Service	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/instance-metadata-service.md	Title: Azure Instance Metadata Service for virtual machines description: Learn about the Azure Instance Metadata Service and how it provides information about currently running virtual machine instances in Linux. -+ Last updated 04/11/2023
virtual-machines	Isolation	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/isolation.md	Title: Isolation for VMs in Azure description: Learn about VM isolation works in Azure. -+ Last updated 04/20/2023
virtual-machines	Linux Vm Connect	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux-vm-connect.md	Title: Connect to a Linux VM description: Learn how to connect to a Linux VM in Azure. -+ Last updated 06/27/2024
virtual-machines	Cli Manage	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/cli-manage.md	Title: Common Azure CLI commands description: Learn some of the common Azure CLI commands to get you started managing your VMs in Azure Resource Manager mode -+ Last updated 04/11/2023
virtual-machines	Create Cli Availability Zone	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-cli-availability-zone.md	Title: Create a zoned VM with the Azure CLI description: Create a virtual machine in an availability zone with the Azure CLI -+ Last updated 04/05/2018
virtual-machines	Create Cli Complete	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-cli-complete.md	Title: Create a Linux environment with the Azure CLI description: Create storage, a Linux VM, a virtual network and subnet, a load balancer, an NIC, a public IP, and a network security group, all from the ground up by using the Azure CLI. -+
virtual-machines	Create Ssh Keys Detailed	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-ssh-keys-detailed.md	Title: Detailed steps to create an SSH key pair description: Learn detailed steps to create and manage an SSH public and private key pair for Linux VMs in Azure. -+ Last updated 08/18/2022
virtual-machines	Create Ssh Secured Vm From Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-ssh-secured-vm-from-template.md	Title: Create a Linux VM in Azure from a template description: How to use the Azure CLI to create a Linux VM from a Resource Manager template -+
virtual-machines	Create Upload Centos	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-upload-centos.md	Title: Create and upload a CentOS-based Linux VHD description: Learn to create and upload an Azure virtual hard disk (VHD) that contains a CentOS-based Linux operating system. -+
virtual-machines	Create Upload Generic	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-upload-generic.md	Title: Prepare Linux for imaging description: Learn how to prepare a Linux system to be used for an image in Azure. -+
virtual-machines	Create Upload Openbsd	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-upload-openbsd.md	Title: Create and upload an OpenBSD image description: Learn how to create and upload a virtual hard disk (VHD) that contains the OpenBSD operating system to create an Azure virtual machine through the Azure CLI. -+
virtual-machines	Create Upload Ubuntu	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/create-upload-ubuntu.md	Title: Create and upload an Ubuntu Linux VHD in Azure description: Learn to create and upload an Azure virtual hard disk (VHD) that contains an Ubuntu Linux operating system. -+
virtual-machines	Debian Create Upload Vhd	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/debian-create-upload-vhd.md	Title: Prepare a Debian Linux VHD description: Learn how to create Debian VHD images for virtual machine deployments in Azure. -+
virtual-machines	Endorsed Distros	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/endorsed-distros.md	Title: Linux distributions endorsed on Azure description: Learn about Linux on Azure-endorsed distributions, including information about Ubuntu, CentOS, Oracle, Flatcar, Debian, Red Hat, and SUSE. -+ Last updated 08/02/2023
virtual-machines	Flatcar Create Upload Vhd	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/flatcar-create-upload-vhd.md	Title: Create and upload a Flatcar Container Linux VHD for use in Azure description: Learn to create and upload a VHD that contains a Flatcar Container Linux operating system. -+
virtual-machines	Freebsd Intro On Azure	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/freebsd-intro-on-azure.md	Title: Introduction to FreeBSD on Azure description: Learn about using FreeBSD virtual machines on Azure. -+
virtual-machines	Hibernate Resume Linux	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/hibernate-resume-linux.md	Title: Learn about hibernating your Linux virtual machine description: Learn how to hibernate a Linux virtual machine. -+ Last updated 05/20/2024
virtual-machines	Hibernate Resume Troubleshooting Linux	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/hibernate-resume-troubleshooting-linux.md	Title: Troubleshoot hibernation on Linux virtual machines description: Learn how to troubleshoot hibernation on Linux VMs. -+ Last updated 05/16/2024
virtual-machines	Key Vault Setup	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/key-vault-setup.md	Title: Set up Azure Key Vault using CLI description: How to set up Key Vault for virtual machine using the Azure CLI. -+ Last updated 10/20/2022
virtual-machines	Mac Create Ssh Keys	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/mac-create-ssh-keys.md	Title: Create and use an SSH key pair for Linux VMs in Azure description: How to create and use an SSH public-private key pair for Linux VMs in Azure to improve the security of the authentication process. -+
virtual-machines	Quick Cluster Create Terraform	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/quick-cluster-create-terraform.md	Title: 'Quickstart: Create a Linux VM cluster in Azure using Terraform' description: In this article, you learn how to create a Linux VM cluster in Azure using Terraform -+ Last updated 07/24/2023
virtual-machines	Quick Create Bicep	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/quick-create-bicep.md	Title: 'Quickstart: Use a Bicep file to create an Ubuntu Linux VM' description: In this quickstart, you learn how to use a Bicep file to create a Linux virtual machine -+ Last updated 03/10/2022
virtual-machines	Quick Create Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/quick-create-cli.md	Title: 'Quickstart: Use the Azure CLI to create a Linux Virtual Machine' description: In this quickstart, you learn how to use the Azure CLI to create a Linux virtual machine -+ Last updated 03/11/2024
virtual-machines	Quick Create Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/quick-create-portal.md	Title: Quickstart - Create a Linux VM in the Azure portal description: In this quickstart, you learn how to use the Azure portal to create a Linux virtual machine. -+ Last updated 01/04/2024
virtual-machines	Quick Create Powershell	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/quick-create-powershell.md	Title: Quickstart - Create a Linux VM with Azure PowerShell description: In this quickstart, you learn how to use Azure PowerShell to create a Linux virtual machine -+ Last updated 06/01/2022
virtual-machines	Quick Create Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/quick-create-template.md	Title: 'Quickstart: Use a Resource Manager template to create an Ubuntu Linux VM' description: Learn how to use an Azure Resource Manager template to create and deploy an Ubuntu Linux virtual machine with this quickstart. -+ Last updated 04/13/2023
virtual-machines	Quick Create Terraform	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/quick-create-terraform.md	Title: 'Quickstart: Use Terraform to create a Linux VM' description: In this quickstart, you learn how to use Terraform to create a Linux virtual machine -+ Last updated 07/24/2023
virtual-machines	Run Command Managed	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/run-command-managed.md	Title: Run scripts in a Linux VM in Azure using managed Run Commands description: This topic describes how to run scripts within an Azure Linux virtual machine by using the updated managed Run Command feature. -+
virtual-machines	Run Command	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/run-command.md	Title: Run scripts in a Linux VM in Azure using action Run Commands description: This article describes how to run scripts within an Azure Linux virtual machine by using the Run Command feature -+
virtual-machines	Run Scripts In Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/run-scripts-in-vm.md	Title: Run scripts in an Azure Linux VM description: This topic describes how to run scripts within a virtual machine -+
virtual-machines	Ssh From Windows	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/ssh-from-windows.md	Title: Use SSH keys to connect to Linux VMs description: Learn how to generate and use SSH keys from a Windows computer to connect to a Linux virtual machine on Azure. -+ Last updated 12/13/2021
virtual-machines	Time Sync	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/time-sync.md	Title: Time sync for Linux VMs in Azure description: Time sync for Linux virtual machines. -+
virtual-machines	Tutorial Automate Vm Deployment	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-automate-vm-deployment.md	Title: Tutorial - Customize a Linux VM with cloud-init in Azure description: In this tutorial, you learn how to use cloud-init and Key Vault to customize Linux VMs the first time they boot in Azure -+ Last updated 04/06/2023
virtual-machines	Tutorial Config Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-config-management.md	Title: Tutorial - Manage Linux virtual machine configuration in Azure description: In this tutorial, you learn how to identify changes and manage package updates on a Linux virtual machine -+ Last updated 09/27/2019
virtual-machines	Tutorial Devops Azure Pipelines Classic	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-devops-azure-pipelines-classic.md	Title: Configure rolling deployments for Azure Linux virtual machines description: Learn how to set up a classic release pipeline and deploy your application to Linux virtual machines using the rolling deployment strategy. -+ azure-pipelines
virtual-machines	Tutorial Elasticsearch	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-elasticsearch.md	description: Install the Elastic Stack (ELK) onto a development Linux VM in Azur -+ ms.devlang: azurecli
virtual-machines	Tutorial Lamp Stack	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-lamp-stack.md	Title: Tutorial - Deploy LAMP and WordPress on a VM description: In this tutorial, you learn how to install the LAMP stack, and WordPress, on a Linux virtual machine in Azure. -+ ms.devlang: azurecli
virtual-machines	Tutorial Lemp Stack	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-lemp-stack.md	Title: Tutorial - Deploy a LEMP stack using WordPress on a VM description: In this tutorial, you learn how to install the LEMP stack, and WordPress, on a Linux virtual machine in Azure. -+ ms.devlang: azurecli
virtual-machines	Tutorial Manage Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-manage-vm.md	Title: Tutorial - Create and manage Linux VMs with the Azure CLI description: In this tutorial, you learn how to use the Azure CLI to create and manage Linux VMs in Azure -+ Last updated 03/23/2023
virtual-machines	Tutorial Secure Web Server	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/tutorial-secure-web-server.md	Title: "Tutorial: Secure a web server with TLS/SSL certificates" description: In this tutorial, you learn how to use the Azure CLI to secure a Linux virtual machine that runs the NGINX web server with SSL certificates stored in Azure Key Vault. -+ Last updated 04/09/2023
virtual-machines	Use Remote Desktop	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/linux/use-remote-desktop.md	Title: Use xrdp with Linux description: Learn how to install and configure Remote Desktop (xrdp) to connect to a Linux VM in Azure using graphical tools -+
virtual-machines	Maintenance And Updates	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/maintenance-and-updates.md	Title: Maintenance and updates description: Overview of maintenance and updates for virtual machines running in Azure.-+ Last updated 04/01/2024 #pmcontact:shants
virtual-machines	Monitor Vm Reference	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/monitor-vm-reference.md	description: This article contains important reference material you need when yo Last updated 03/27/2024 -+ # Azure Virtual Machines monitoring data reference
virtual-machines	Monitor Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/monitor-vm.md	description: Start here to learn how to monitor Azure Virtual Machines and Virtu Last updated 03/27/2024 -+ #customer intent: As a cloud administrator, I want to understand how to monitor Azure virtual machines so that I can ensure the health and performance of my virtual machines and applications.
virtual-machines	Move Region Maintenance Configuration Resources	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/move-region-maintenance-configuration-resources.md	Title: Move resources associated with a maintenance configuration to another region description: Learn how to move resources associated with a VM maintenance configuration to another Azure region-+ Last updated 03/04/2020 #Customer intent: As an admin responsible for maintenance, I want move resources associated with a Maintenance Control configuration to another Azure region.
virtual-machines	Move Region Maintenance Configuration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/move-region-maintenance-configuration.md	Title: Move a maintenance configuration to another Azure region description: Learn how to move a VM maintenance configuration to another Azure region-+ vm Last updated 03/04/2020
virtual-machines	Move Virtual Machines Regional Zonal Faq	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/move-virtual-machines-regional-zonal-faq.md	Title: FAQ - Move Azure single instance Virtual Machines from regional to zonal availability zones description: FAQs for single instance Azure virtual machines from a regional configuration to a target Availability Zone within the same Azure region. -+ Last updated 05/06/2024
virtual-machines	Move Virtual Machines Regional Zonal Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/move-virtual-machines-regional-zonal-portal.md	Title: Tutorial - Move Azure single instance Virtual Machines from regional to zonal availability zones description: Learn how to move single instance Azure virtual machines from a regional configuration to a target Availability Zone within the same Azure region. -+ Last updated 07/08/2024
virtual-machines	Move Virtual Machines Regional Zonal Powershell	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/move-virtual-machines-regional-zonal-powershell.md	Title: Move Azure single instance Virtual Machines from regional to zonal availability zones using PowerShell and CLI description: Move single instance Azure virtual machines from a regional configuration to a target Availability Zone within the same Azure region using PowerShell and CLI. -+ Last updated 06/10/2024
virtual-machines	Msv3 Mdsv3 Medium Series	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/msv3-mdsv3-medium-series.md	Title: Overview of Msv3 and Mdsv3 Medium Memory Series description: Overview of Msv3 and Mdsv3 Medium Memory virtual machines. These virtual machines provide faster performance and lower TCO. -+ # ms.prod: sizes Last updated 06/26/2024
virtual-machines	Nd H100 V5 Series	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/nd-h100-v5-series.md	Title: ND H100 v5-series description: Specifications for the ND H100 v5-series VMs -+ Last updated 08/04/2023
virtual-machines	Network Security Group Test	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/network-security-group-test.md	Title: Network security group test description: Learn how to check if a security rule is blocking traffic to or from your virtual machine (VM) using network security group test in the Azure portal. -+ Last updated 07/17/2023
virtual-machines	Nvme Linux	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/nvme-linux.md	Last updated 07/31/2024 -+ Title: SCSI to NVMe for Linux VMs description: How to convert SCSI to NVMe using Linux
virtual-machines	Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/overview.md	Title: Overview of virtual machines in Azure description: Overview of virtual machines in Azure. -+ Last updated 07/01/2024
virtual-machines	Regions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/regions.md	description: Learn about the regions for running virtual machines in Azure. -+ Last updated 02/21/2023
virtual-machines	Resource Graph Samples	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/resource-graph-samples.md	Last updated 07/07/2022 -+
virtual-machines	Restore Point Troubleshooting	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/restore-point-troubleshooting.md	Title: Troubleshoot restore point failures description: Symptoms, causes, and resolutions of restore point failures related to agent, extension, and disks. Last updated 04/12/2023-+
virtual-machines	Run Command Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/run-command-overview.md	Title: Run scripts in a Windows or Linux VM in Azure with Run Command description: This topic provides an overview of running scripts within an Azure virtual machine by using the Run Command feature-+
virtual-machines	Security Controls Policy	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/security-controls-policy.md	Last updated 02/06/2024 -+ # Azure Policy Regulatory Compliance controls for Azure Virtual Machines
virtual-machines	Mbsv3 Mbdsv3 Series	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/sizes/memory-optimized/mbsv3-mbdsv3-series.md	Title: Overview of Mbsv3 and Mbdsv3 Series description: Overview of Mbsv3 and Mbdsv3 virtual machines -+ # ms.prod: sizes Last updated 07/15/2024
virtual-machines	Resize Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/sizes/resize-vm.md	Title: Resize a virtual machine description: Change the VM size used for an Azure virtual machine. -+ Last updated 01/31/2024
virtual-machines	Ssh Keys Azure Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/ssh-keys-azure-cli.md	Title: Create SSH keys with the Azure CLI description: Learn how to generate and store SSH keys, before creating a VM, with the Azure CLI for connecting to Linux VMs. -+ Last updated 04/13/2023
virtual-machines	Ssh Keys Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/ssh-keys-portal.md	Title: Create SSH keys in the Azure portal description: Learn how to generate and store SSH keys in the Azure portal for connecting the Linux VMs. -+ Last updated 04/27/2023
virtual-machines	Understand Vm Reboots	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/understand-vm-reboots.md	Title: Understand VM reboots description: Understand VM reboots - maintenance vs downtime -+ Last updated 02/28/2023
virtual-machines	Updates Maintenance Overview	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/updates-maintenance-overview.md	Title: Guest updates and host maintenance overview description: Learn about the updates and maintenance options available with virtual machines in Azure -+ Last updated 03/20/2024
virtual-machines	Vm Naming Conventions	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/vm-naming-conventions.md	Title: Azure VM sizes naming conventions description: Explains the naming conventions used for Azure VM sizes-+ subservice: sizes
virtual-machines	Vm Support Help	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/vm-support-help.md	Title: Azure Virtual Machine support and help options description: How to obtain help and support for questions or problems when you create solutions using Azure Virtual Machines. -+ Last updated 05/30/2024
virtual-machines	Vm Usage	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/vm-usage.md	-+ vm Last updated 05/01/2023
virtual-machines	Connect Ssh	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/connect-ssh.md	Title: Connect using SSH to an Azure VM running Windows description: Learn how to connect using Secure Shell and sign on to a Windows VM. -+ Last updated 06/29/2022
virtual-machines	Connect Winrm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/connect-winrm.md	Title: Connect using WinRM to an Azure VM running Windows description: Set up WinRM access for use with an Azure virtual machine created in the Resource Manager deployment model. -+ Last updated 3/25/2022
virtual-machines	Create Powershell Availability Zone	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/create-powershell-availability-zone.md	Title: Create a zoned VM using Azure PowerShell description: Create a virtual machine in an availability zone with Azure PowerShell -+ Last updated 03/27/2018
virtual-machines	External Ntpsource Configuration	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/external-ntpsource-configuration.md	Title: Time mechanism for Active Directory Windows Virtual Machines in Azure description: Time mechanism for Active Directory Windows Virtual Machines in Azure -+ Last updated 08/05/2022
virtual-machines	Hibernate Resume Troubleshooting Windows	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/hibernate-resume-troubleshooting-windows.md	Title: Troubleshoot hibernation on Windows virtual machines description: Learn how to troubleshoot hibernation on Windows VMs. -+ Last updated 05/16/2024
virtual-machines	Hibernate Resume Windows	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/hibernate-resume-windows.md	Title: Learn about hibernating your Windows virtual machine description: Learn how to hibernate a Windows virtual machine. -+ Last updated 05/16/2024
virtual-machines	Java	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/java.md	Title: Create and Manage an Azure Virtual Machine Using Java description: Use Java and Azure Resource Manager to deploy a virtual machine and all its supporting resources. -+ Last updated 10/09/2021
virtual-machines	Multiple Nics	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/multiple-nics.md	Title: Create and manage Windows VMs in Azure that use multiple NICs description: Learn how to create and manage a Windows VM that has multiple NICs attached to it by using Azure PowerShell or Resource Manager templates. -+ Last updated 09/26/2017
virtual-machines	Ps Common Ref	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/ps-common-ref.md	Title: Common PowerShell commands for Azure Virtual Machines description: Common PowerShell commands to get you started creating and managing VMs in Azure. -+ Last updated 09/07/2023
virtual-machines	Ps Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/ps-template.md	Title: Create a Windows VM from a template in Azure description: Use a Resource Manager template and PowerShell to easily create a new Windows VM. -+ Last updated 02/24/2023
virtual-machines	Quick Cluster Create Terraform	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/quick-cluster-create-terraform.md	Title: 'Quickstart: Create a Windows VM cluster in Azure using Terraform' description: In this article, you learn how to create a Windows VM cluster in Azure using Terraform -+ Last updated 07/24/2023
virtual-machines	Quick Create Bicep	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/quick-create-bicep.md	Title: 'Quickstart: Use a Bicep file to create a Windows VM' description: In this quickstart, you learn how to use a Bicep file to create a Windows virtual machine -+ Last updated 03/11/2022
virtual-machines	Quick Create Cli	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/quick-create-cli.md	Title: Quickstart - Create a Windows VM using the Azure CLI description: In this quickstart, you learn how to use the Azure CLI to create a Windows virtual machine -+ Last updated 02/23/2023
virtual-machines	Quick Create Portal	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/quick-create-portal.md	Title: Quickstart - Create a Windows VM in the Azure portal description: In this quickstart, you learn how to use the Azure portal to create a Windows virtual machine -+ Last updated 01/04/2024
virtual-machines	Quick Create Powershell	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/quick-create-powershell.md	Title: Quickstart - Create a Windows VM with Azure PowerShell description: Learn how to use the Azure PowerShell module to deploy a virtual machine (VM) in Azure that runs Windows Server 2016. -+ Last updated 04/04/2023
virtual-machines	Quick Create Template	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/quick-create-template.md	Title: 'Quickstart: Use a Resource Manager template to create a Windows VM' description: Learn how to use a Resource Manager template to create, deploy and clean up a Windows virtual machine. -+ Last updated 04/03/2023
virtual-machines	Quick Create Terraform	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/quick-create-terraform.md	Title: 'Quickstart: Use Terraform to create a Windows VM' description: In this quickstart, you learn how to use Terraform to create a Windows virtual machine -+ Last updated 07/17/2023
virtual-machines	Run Command Managed	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/run-command-managed.md	Title: Run scripts in a Windows VM in Azure using managed Run Commands description: This topic describes how to run scripts within an Azure Windows virtual machine by using the updated Run Command feature. -+
virtual-machines	Run Command	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/run-command.md	Title: Run scripts in a Windows VM in Azure using action Run Commands description: This article describes how to run PowerShell scripts within an Azure Windows virtual machine by using the Run Command feature -+
virtual-machines	Run Scripts In Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/run-scripts-in-vm.md	Title: Run scripts in an Azure Windows VM description: This topic describes how to run scripts within a Windows virtual machine -+
virtual-machines	Template Description	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/template-description.md	Title: Virtual machines in an Azure Resource Manager template \| Microsoft Azure description: Learn more about how the virtual machine resource is defined in an Azure Resource Manager template. -+ Last updated 04/11/2023
virtual-machines	Time Sync	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/time-sync.md	Title: Time sync for Windows VMs in Azure description: Time sync for Windows virtual machines. -+ Last updated 09/17/2018
virtual-machines	Tutorial Automate Vm Deployment	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/tutorial-automate-vm-deployment.md	Title: Tutorial - Install applications on a Windows VM in Azure description: Learn how to use the Custom Script Extension to run scripts and deploy applications to Windows virtual machines in Azure. -+ Last updated 04/07/2023
virtual-machines	Tutorial Config Management	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/tutorial-config-management.md	Title: Tutorial - Manage Windows virtual machine configuration in Azure description: In this tutorial, you learn how to identify changes and manage package updates on a Windows virtual machine -+ Last updated 12/05/2018
virtual-machines	Tutorial Manage Vm	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/tutorial-manage-vm.md	Title: Tutorial - Create and manage Windows VMs with Azure PowerShell description: In this tutorial, you learn how to use Azure PowerShell to create and manage Windows VMs in Azure -+ Last updated 03/29/2022
virtual-machines	Windows Desktop Multitenant Hosting Deployment	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-machines/windows/windows-desktop-multitenant-hosting-deployment.md	Title: How to deploy Windows 11 on Azure description: Learn how to maximize your Windows Software Assurance benefits to bring on-premises licenses to Azure with Multitenant Hosting Rights. -+ Last updated 10/24/2022
virtual-network-manager	Concept User Defined Route	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-network-manager/concept-user-defined-route.md	When you create a rule collection, you define the local routing settings. The lo When you select Direct routing within virtual network or Direct routing within subnet, a UDR with a virtual network next hop is created for local traffic routing within the same virtual network or subnet. However, if the destination CIDR is fully contained within the source CIDR under these selections and direct routing is selected, a UDR specifying a network appliance as the next hop won't be set up. +## Adding additional virtual networks + +When you add additional virtual networks to a network group, the routing configuration is automatically applied to the new virtual network. Your network manager automatically detects the new virtual network and applies the routing configuration to it. When you remove a virtual network from the network group, the applied routing configuration is automatically removed as well. ++ ## Limitations of UDR management The following are the limitations of UDR management with Azure Virtual Network
virtual-network-manager	How To Manage User Defined Routes Multiple Hub Spoke Topologies	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-network-manager/how-to-manage-user-defined-routes-multiple-hub-spoke-topologies.md	+ + Title: "Manage User-defined Routes (UDRs) across multiple hub-and-spoke topologies with Azure Virtual Network Manager" +description: Learn to manage User Defined Routes (UDRs) across multiple hub-and-spoke topologies with Azure Virtual Network Manager. ++ Last updated : 08/02/2024++ +# customer intent: As a network administrator, I want to deploy a Spoke-to-Spoke topology with two hubs using Virtual Network Manager. ++ +# Manage User-defined Routes (UDRs) across multiple hub-and-spoke topologies with Azure Virtual Network Manager + +In this article, you learn how to deploy multiple hub-and-spoke topologies, and manage user-defined routes (UDRs) with Azure Virtual Network Manager. This scenario is useful when you have a hub and spoke architecture in multiple Azure regions. In the past, customers with firewalls or network virtual appliances performed many manual operations to do cross-hub and spoke in the past. Users needed many user-defined routes(UDRs) to be set up by hand, and when there were changes in spoke virtual networks, such as adding new spoke virtual networks and subnets, they also needed to change user-defined routes and route tables. UDR management with Virtual Network Manager can help you automate these tasks. + +## Prerequisites ++ +- An Azure subscription with a Virtual Network Manager deployed with UDR management enabled. +- Two hub-and-spoke topologies deployed in different Azure regions. +- Azure firewall instances deployed in each regional hub - total of two instances. +- Hub virtual networks in each region are peered with each other. +- Virtual machines deployed in the spoke virtual networks in each region to confirm network connectivity across regions. + +> [!NOTE] +> For this scenario, you have hub-and-spoke topologies deployed in two different Azure regions. Examples in this scenario will use West US 2 as the region1 placeholder and West US 3 as the region2 placeholder. You may use any other Azure regions as needed. + +## Create network groups + +In this step, you deploy four network groups to represent the spoke virtual networks in each hub and spoke topology. You also create network groups for the subnet of the Azure firewall instances in each region. + +\| Network group name\| Description \| Member type \| Members \| +\|--\|-\|--\| +\| ng-spoke-\<region1\> \| Network group for spoke virtual networks in region 1. \| Virtual network \| Spoke virtual networks in region 1. \| +\| ng-spoke-\<region2\> \| Network group for spoke virtual networks in region 2. \| Virtual network \| Spoke virtual networks in region 2. \| +\| ng-azfw-\<region1\> \| Network group for Azure Firewall subnet in region 1. \| Subnet \| Azure Firewall subnet in region 1. \| +\| ng-azfw-\<region2\> \| Network group for Azure Firewall subnet in region 2. \| Subnet \| Azure Firewall subnet in region 2. \| + +1. In the Azure portal, navigate to your network manager instance. +2. In the left pane, select Network groups under Settings. +3. Select + Create. +4. In the Create a network group window, enter the following details then select Create: + + \| Field \| Value \| + \|-\|-\| + \| Name \| Enter *ng-spoke-\<region1\>* or the name of the network of the first hub virtual network in region one. \| + \| Description \| Enter a description for the network group. \| + \| Member type \| Select Virtual network from the dropdown menu. \| + +5. Repeat the steps create the remaining network groups for the spoke virtual networks in region 2 and the Azure Firewall subnets in each region based on the table above. + +## Add members to the network groups + +In this step, you add the spoke virtual networks and Azure Firewall subnets to the network groups you created. + +1. In the network manager instance, navigate to Network groups under Settings. +2. Select the network group for the spoke virtual networks in region 1 - ng-spoke-\<region1\>. +3. Under Manage memberships of your network group, select Add virtual networks. +4. In the Manually add members window, select the spoke virtual networks in region 1, then select Add. + +## Deploy hub and spoke topologies in each region + +In this step, you create two connectivity configurations in your network manager instance to deploy hub and spoke topologies in each region. You create a connectivity configuration for each hub and spoke topology in each region. + +1. In your network manager instance, navigate to Configurations under Settings. +2. Select + Create>Connectivity configuration. +3. In the Basic tab of the Create a connectivity configuration window, enter a name and description for the first connectivity configuration. +4. Select the Topology tab or Next: Topology. +5. In the Topology tab, select the following details: + + \| Field \| Value \| + \|-\|-\| + \| Topology \| Select Hub and spoke. \| + \| Hub \| Choose Select a hub and select the hub virtual network in region 1. \| + \| Spoke network groups \| Select Add.</br> In the Add network groups* window, select the spoke network groups from region 1: *ng-spoke-\<region1\>*,ng-spoke-\<region2\> \| + +6. Select the Visualization tab or Next: Visualization to review the topology. +7. Select Review + create then Create and start deployment to deploy the connectivity configuration. +8. In the Deploy a configuration window, select your configuration for region 1 under Connectivity configurations. +9. Under Regions, select all regions where you want to deploy the configuration. +10. Select Next and Deploy to deploy the configuration in region 1. +11. Repeat the steps to create and deploy a connectivity configuration for the second hub and spoke topology in region 2. + +## Create a Routing configuration and Rule collections + +In this step, you create a routing configuration containing four rule collections to manage the network groups created earlier. + +\| Rule collection name \| Description \| Target network group** \| +\|--\|-\|-\| +\| rc-spoke-\<region1\> \| Rule collection for spoke virtual networks in region 1. \| ng-spoke-\<region1\> \| +\| rc-spoke-\<region2\> \| Rule collection for spoke virtual networks in region 2. \| ng-spoke-\<region2\> \| +\| rc-azfw-\<region1\> \| Rule collection for Azure Firewall subnet in region 1. \| ng-azfw-\<region1\> \| +\| rc-azfw-\<region2\> \| Rule collection for Azure Firewall subnet in region 2. \| ng-azfw-\<region2\> \| + +1. In your network manager instance, navigate to Configurations under Settings. +2. Select + Create>Routing configuration - Preview. +3. In the Create a routing configuration window, enter a name and description for the routing configuration. +4. Select Next: Rule collections or the Rule collections tab. +5. In the Rule collections tab, select + Add or Add. +6. In the Add a rule collection window, enter or select the following details, then select Add: + + \| Field \| Value \| + \|\|-\| + \| Name \| Enter a name for the rule collection for *ng-spoke-\<region1\>*. \| + \| Description \| Enter a description for the rule collection. \| + \| Local route setting \| Select Not specified. \| + \| Target network group \| Select the network group for the spoke virtual networks in region 1. \| + +7. Repeat the steps to create rule collections for the remaining network groups: ng-spoke-\<region2\>, ng-azfw-\<region1\>, and ng-azfw-\<region2\>. +8. Select Add to add the rule collections to the routing configuration. +9. select Next: Review + create or Review + create to review the routing configuration. +10. Select Create to create the routing configuration. ++ +## Create routing rules for each Rule collection + +In these steps, you create routing rules in each rule collection to manage traffic between the spoke virtual networks and the Azure Firewall subnets in each region. + +### Create a routing rule for spoke virtual networks in region 1 +In this step, you create a routing rule for the spoke virtual networks in region 1 allowing communication with the Azure Firewall subnet in region 1. + +1. In your network manager instance, navigate to Configurations under Settings. +2. Select the routing configuration you created for the spoke virtual networks in region 1. +3. In the left pane, select Rule collections and select your first rule collection - rc-spoke-\<region1\>. +4. In the Edit a rule collection window, select + Add. +5. In the Add a routing rule window, enter or select the following information: + + \| Field \| Value \| + \|\|-\| + \| Name \| Enter a name for the routing rule for. \| + \| Destination type \| Select IP Address. \| + \| Destination IP addresses/CIDR ranges \| Enter the default route of 0.0.0.0/0.\| + \| Next hop type \| Select Virtual appliance.</br> Select Import Azure firewall private IP address. Select the Azure Firewall in region 1. \| + +6. Select Add to add the routing rule to the rule collection. +7. Select the X to close the Edit a rule collection window. + +### Create a routing rule for Azure Firewall in region 1 + +In these steps, you create a routing rule for the Azure Firewall subnet in region 1 allowing communication with the spoke virtual networks in region 2. + +For this example, the remote regions address prefixes are summarized. Summarizing address prefixes offers the benefit of not needing to change the routing rules for the Azure firewall subnet even if new spokes are added to each region. However, it's important to predefine the address prefixes used in each region, including for future use. + +1. In the Rule collections window, select the rule collection for the Azure Firewall subnet in region 1 - rc-azfw-\<region1\>. +2. In the Edit a rule collection window, select + Add. +3. In the Add a routing rule window, enter or select the following information: + + \| Field \| Value \| + \|\|-\| + \| Name \| Enter a name for the routing rule for. \| + \| Destination type \| Select IP Address. \| + \| Destination IP addresses/CIDR ranges \| Enter the summarized address prefix for the remote region - region 2. In this example, 10.1.0.0/16 is used.\| + \| Next hop type \| Select Virtual appliance.</br> Select Import Azure firewall private IP address. Select the remote Azure Firewall in region 2. \| + +4. Select Add to add the routing rule to the rule collection. +5. select + Add to add a default Internet rule. +In the Add a routing rule window, enter or select the following information: + + \| Field \| Value \| + \|\|-\| + \| Name \| Enter a name for the routing rule for. \| + \| Destination type \| Select IP Address. \| + \| Destination IP addresses/CIDR ranges \| Enter the default route of 0.0.0.0/0.\| + \| Next hop type \| Select Internet. \| ++ +6. Select Add to add the routing rule to the rule collection. +7. Select the X to close the Edit a rule collection window. + +> [!NOTE] +> A summarized prefix allows you to use a larger address range for the destination IP addresses. This is useful when you have multiple spoke virtual networks in each region and you want to avoid adding multiple routing rules for each spoke virtual network. Also, future changes to the spoke virtual networks in each region will not require changes to the routing rules for the Azure Firewall subnet. + +### Create a routing rule for spoke virtual networks in region 2 + +In this step, you create a routing rule for the spoke virtual networks in region 2 allowing communication with the Azure Firewall subnet in region 2. + +1. In the Rule collections window, select the rule collection for the spoke virtual networks in region 2 - rc-spoke-\<region2\>. +2. In the Edit a rule collection window, select + Add. +3. In the Add a routing rule window, enter or select the following information: + + \| Field \| Value \| + \|\|-\| + \| Name \| Enter a name for the routing rule for. \| + \| Destination type \| Select IP Address. \| + \| Destination IP addresses/CIDR ranges \| Enter the default route of 0.0.0.0/0.\| + \| Next hop type \| Select Virtual appliance.</br> Select Import Azure firewall private IP address. Select the Azure Firewall in region 2. \| + +5. select + Add to add a default Internet rule. +In the Add a routing rule window, enter or select the following information: + + \| Field \| Value \| + \|\|-\| + \| Name \| Enter a name for the routing rule for. \| + \| Destination type \| Select IP Address. \| + \| Destination IP addresses/CIDR ranges \| Enter the default route of 0.0.0.0/0.\| + \| Next hop type \| Select Internet. \| ++ +6. Select Add to add the routing rule to the rule collection. +7. Select the X to close the Edit a rule collection window. + +### Create a routing rule for Azure Firewall in region 2 + +In these steps, you create a routing rule for the Azure Firewall subnet in region 2 allowing communication with the spoke virtual networks in region 1. + +1. In the Rule collections window, select the rule collection for the Azure Firewall subnet in region 2 - rc-azfw-\<region2\>. +2. In the Edit a rule collection window, select + Add. +3. In the Add a routing rule window, enter or select the following information: + + \| Field \| Value \| + \|\|-\| + \| Name \| Enter a name for the routing rule for. \| + \| Destination type \| Select IP Address. \| + \| Destination IP addresses/CIDR ranges \| Enter the summarized address prefix for the remote region - region 1. In this example, 10.0.0.0/16 is used.\| + \| Next hop type \| Select Virtual appliance.</br> Select Import Azure firewall private IP address. Select the remote Azure Firewall in region 1. \| + +4. Select Add to add the routing rule to the rule collection. +5. Select the X to close the Edit a rule collection window. +6. From the Rule collections window, select Rules under Settings, and review the listing of all rules in the routing configuration. + + :::image type="content" source="media/how-to-manage-user-defined-routes-multiple-hub-spoke-topologies/review-rules-in-configuration-thumb.png" alt-text="Screenshot of rules window listing all rules and rule collections in routing configuration." lightbox="media/how-to-manage-user-defined-routes-multiple-hub-spoke-topologies/review-rules-in-configuration.png"::: + +## Deploy the routing configuration + +In this step, you deploy the routing configuration to apply the routing rules to the spoke virtual networks and Azure Firewall subnets in each region. + +1. Browse to Configurations under Settings in your network manager instance. +2. Select the checkbox next to the routing configuration you created, and select Deploy from the taskbar. +3. In the Deploy a configuration window, select all regions where you want to deploy the routing configuration. +4. Select Next and Deploy to deploy the routing configuration. + +## Confirm routing configuration + +In this step, you confirm the routing configuration by reviewing the route tables applied to the subnets in each spoke virtual network. + +1. In the Azure portal search bar, search and select Virtual networks. +2. In the Virtual networks window, select the one of the spoke virtual networks in region 1. +3. From the left menu, select Subnets and review settings for the subnets in the spoke virtual network. + + :::image type="content" source="media/how-to-manage-user-defined-routes-multiple-hub-spoke-topologies/View-route-table-on-subnet-thumb.png" alt-text="Screenshot of subnets showing applied route table from deployment of routing configuration."lightbox="media/how-to-manage-user-defined-routes-multiple-hub-spoke-topologies/View-route-table-on-subnet.png"::: + +4. Under Route table, select the link staring with NM_** to view the route table applied to the subnet. + + :::image type="content" source="media/how-to-manage-user-defined-routes-multiple-hub-spoke-topologies/route-table-list.png" alt-text="Screenshot of route table for subnet."::: + +5. Close the route table and subnet window. +6. Repeat the steps to review the route tables applied to all the subnets in your configuration. + +## Adding a spoke virtual network to an existing multi-hub and spoke topology + +When you add other virtual networks to a network group for spoke virtual networks, the connectivity and routing configurations are automatically applied to the new virtual network. Your network manager automatically detects the new virtual network and applies all applicable configurations. When you remove a virtual network from the network group, any applied configurations are automatically removed. + +## Next steps + +> [!div class="nextstepaction"] +> [How to deploy hub and spoke topology with Azure Firewall](how-to-deploy-hub-spoke-topology-with-azure-firewall.md)
virtual-network	Virtual Networks Name Resolution For Vms And Role Instances	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances.md	Points to consider when you're using Azure-provided name resolution: ### Reverse DNS Considerations -Reverse DNS for VMs is supported in all Azure Resource Manager based virtual networks. Azure-managed reverse DNS (PTR) records of form \[vmname\].internal.cloudapp.net are automatically added to when you start a VM, and removed when the VM is stopped (deallocated). See the following example: +Reverse DNS for VMs is supported in all Azure Resource Manager based virtual networks. Azure-managed reverse DNS (PTR) records of form \[vmname\].internal.cloudapp.net are automatically added to DNS when you start a VM, and removed when the VM is stopped (deallocated). See the following example: ```cmd C:\>nslookup -type=ptr 10.11.0.4
vpn-gateway	About Active Active Gateways	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/vpn-gateway/about-active-active-gateways.md	description: Learn about active-active VPN gateways, including configuration and Previously updated : 07/22/2024 Last updated : 08/08/2024 To avoid this interruption, you can always create your gateway in **active-activ ### Active-active design -In an active-active configuration, both instances of the gateway VMs establish S2S VPN tunnels to your on-premises VPN device, as shown the following diagram: +In an active-active configuration for a S2S connection, both instances of the gateway VMs establish S2S VPN tunnels to your on-premises VPN device, as shown the following diagram: :::image type="content" source="./media/vpn-gateway-highlyavailable/active-active.png" alt-text="Diagram shows an on-premises site with private IP subnets and an on-premises gateway connected to two VPN gateway instances."::: -In this configuration, each Azure gateway instance has a unique public IP address, and each will establish an IPsec/IKE S2S VPN tunnel to your on-premises VPN device specified in your local network gateway and connection. Both VPN tunnels are actually part of the same connection. You'll still need to configure your on-premises VPN device to accept or establish two S2S VPN tunnels to those two Azure VPN gateway public IP addresses. +In this configuration, each Azure gateway instance has a unique public IP address, and each will establish an IPsec/IKE S2S VPN tunnel to your on-premises VPN device specified in your local network gateway and connection. Both VPN tunnels are actually part of the same connection. You'll still need to configure your on-premises VPN device to accept or establish two S2S VPN tunnels, one for each gateway VM instance. P2S connections using active-active mode don't require any special additional configuration. Because the Azure gateway instances are in an active-active configuration, the traffic from your Azure virtual network to your on-premises network are routed through both tunnels simultaneously, even if your on-premises VPN device might favor one tunnel over the other. For a single TCP or UDP flow, Azure attempts to use the same tunnel when sending packets to your on-premises network. However, your on-premises network could use a different tunnel to send packets to Azure. When a planned maintenance or unplanned event happens to one gateway instance, the IPsec tunnel from that instance to your on-premises VPN device will be disconnected. The corresponding routes on your VPN devices should be removed or withdrawn automatically so that the traffic will be switched over to the other active IPsec tunnel. On the Azure side, the switch over will happen automatically from the affected instance to the other active instance. > [!NOTE] -> If only one tunnel is connected, or both the tunnels are connected to one instance in active-active mode, the tunnel will go down during maintenance. +> [!INCLUDE [establish two tunnels](../../includes/vpn-gateway-active-active-tunnel.md)] ### Dual-redundancy active-active design
vpn-gateway	Ikev2 Openvpn From Sstp	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/vpn-gateway/ikev2-openvpn-from-sstp.md	description: Learn how to transition to OpenVPN protocol or IKEv2 from SSTP to o Previously updated : 05/15/2024 Last updated : 08/08/2024 Point-to-site VPN can use one of the following protocols: * IKEv2 VPN, a standards-based IPsec VPN solution. IKEv2 VPN can be used to connect from Mac devices (macOS versions 10.11 and above). ->[!NOTE] ->IKEv2 and OpenVPN for P2S are available for the [Resource Manager deployment model](../azure-resource-manager/management/deployment-models.md) only. They are not available for the classic deployment model. The Basic gateway SKU does not support IKEv2 or OpenVPN protocols. If you are using the Basic SKU, you will have to delete and recreate a production SKU virtual network gateway. -> +> [!NOTE] +> IKEv2 and OpenVPN for P2S are available for the [Resource Manager deployment model](../azure-resource-manager/management/deployment-models.md) only. They are not available for the classic deployment model. The Basic gateway SKU does not support IKEv2 or OpenVPN protocols. If you are using the Basic SKU, you will have to delete and recreate a production SKU virtual network gateway. ## <a name="migrate"></a>Migrating from SSTP to IKEv2 or OpenVPN To add IKEv2 to an existing gateway, go to the "point-to-site configuration" tab :::image type="content" source="./media/ikev2-openvpn-from-sstp/add-tunnel-type.png" alt-text="Screenshot that shows the Point-to-site configuration page with the Tunnel type drop-down open, and IKEv2 and SSTP(SSL) selected." lightbox="./media/ikev2-openvpn-from-sstp/add-tunnel-type.png"::: ->[!NOTE] +> [!NOTE] > When you have both SSTP and IKEv2 enabled on the gateway, the point-to-site address pool will be statically split between the two, so clients using different protocols will be assigned IP addresses from either sub-range. Note that the maximum amount of SSTP clients is always 128, even if the address range is larger than /24 resulting in a bigger amount of addresses available for IKEv2 clients. For smaller ranges, the pool will be equally halved. Traffic Selectors used by the gateway may not include the point-to-site address range CIDR, but the two sub-range CIDRs. -> ### Option 2 - Remove SSTP and enable OpenVPN on the Gateway If you're using Windows 10 or later, you can also use the [Azure VPN Client](poi ### What are the client configuration requirements? ->[!NOTE] ->For Windows clients, you must have administrator rights on the client device in order to initiate the VPN connection from the client device to Azure. -> +> [!NOTE] +> For Windows clients, you must have administrator rights on the client device in order to initiate the VPN connection from the client device to Azure. Users use the native VPN clients on Windows and Mac devices for P2S. Azure provides a VPN client configuration zip file that contains settings required by these native clients to connect to Azure. Users use the native VPN clients on Windows and Mac devices for P2S. Azure provi The zip file also provides the values of some of the important settings on the Azure side that you can use to create your own profile for these devices. Some of the values include the VPN gateway address, configured tunnel types, routes, and the root certificate for gateway validation. ->[!NOTE] ->[!INCLUDE [TLS version changes](../../includes/vpn-gateway-tls-change.md)] -> +> [!NOTE] +> [!INCLUDE [TLS version changes](../../includes/vpn-gateway-tls-change.md)] ### <a name="gwsku"></a>Which gateway SKUs support P2S VPN? The following table shows gateway SKUs by tunnel, connection, and throughput. Fo [!INCLUDE [aggregate throughput sku](../../includes/vpn-gateway-table-gwtype-aggtput-include.md)] > [!NOTE] -> The Basic SKU has limitations and does not support IKEv2, or RADIUS authentication. See the [VPN Gateway settings](vpn-gateway-about-vpn-gateway-settings.md#gwsku) article for more information. -> +> The Basic SKU has limitations and does not support IKEv2, or RADIUS authentication. ### <a name="IKE/IPsec policies"></a>What IKE/IPsec policies are configured on VPN gateways for P2S? The following table shows gateway SKUs by tunnel, connection, and throughput. Fo \| AES256 \| SHA1 \| GROUP_NONE \| ### <a name="TLS policies"></a>What TLS policies are configured on VPN gateways for P2S? -TLS - -\|Policies \| -\|\| -\|TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 \| -\|TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 \| -\|TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 \| -\|TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 \| -\|TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 \| -\|TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 \| -\|TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 \| -\|TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 \| -\|TLS_RSA_WITH_AES_128_GCM_SHA256 \| -\|TLS_RSA_WITH_AES_256_GCM_SHA384 \| -\|TLS_RSA_WITH_AES_128_CBC_SHA256 \| -\|TLS_RSA_WITH_AES_256_CBC_SHA256 \| + ### <a name="configure"></a>How do I configure a P2S connection?
vpn-gateway	Point To Site About	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/vpn-gateway/point-to-site-about.md	Previously updated : 05/23/2024 Last updated : 08/08/2024 # About Point-to-Site VPN The tables in this section show the values for the default policies. However, th ## <a name="TLS policies"></a>What TLS policies are configured on VPN gateways for P2S? -TLS - -\|Policies \| -\|\| -\|TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 \| -\|TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 \| -\|TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384\| -\|TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 \| -\|TLS_AES_256_GCM_SHA384 \| -\|TLS_AES_128_GCM_SHA256 \| - -**Only supported on TLS1.3 with OpenVPN ## <a name="configure"></a>How do I configure a P2S connection?
vpn-gateway	Point To Site Entra Register Custom App	https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/vpn-gateway/point-to-site-entra-register-custom-app.md	The steps in this article help you create a Microsoft Entra ID custom App ID (custom audience) for the new Microsoft-registered Azure VPN Client for point-to-site (P2S) connections. You can also update your existing tenant to [change the new Microsoft-registered Azure VPN Client app](#change) from the previous Azure VPN Client app. -If you need to create a custom audience using a value other than the Azure Public value `c632b3df-fb67-4d84-bdcf-b95ad541b5c8`, you can replace this value with the value you require. For more information and to see the available audience ID values for the Azure VPN Client app, see [Microsoft Entra ID authentication for P2S](point-to-site-about.md#microsoft-entra-id-authentication). +When you configure a custom audience app ID, you can use any of the supported values associated with the Azure VPN Client app. We recommend that you associate the Microsoft-registered App ID Azure Public audience value `c632b3df-fb67-4d84-bdcf-b95ad541b5c8` to your custom app when possible. For the full list of supported values, see [P2S VPN - Microsoft Entra ID](point-to-site-about.md#entra-id). -This article provides high-level steps. The screenshots might be different than what you experience in the Azure portal, but the settings are the same. For more information, see [Quickstart: Register an application](/entra/identity-platform/quickstart-register-app). +This article provides high-level steps. The screenshots to register an application might be slightly different, depending on the way you access the user interface, but the settings are the same. For more information, see [Quickstart: Register an application](/entr#entra-id). ## Prerequisites This article assumes that you already have a Microsoft Entra tenant and the perm [!INCLUDE [Configure custom audience](../../includes/vpn-gateway-custom-audience.md)] -After you've completed these steps, continue to [Configure P2S VPN Gateway for Microsoft Entra ID authentication ΓÇô Microsoft-registered app](point-to-site-entra-gateway.md). +## Configure the P2S VPN gateway -## <a name="change"></a>Change to Microsoft-registered VPN client app +After you've completed the steps in the previous sections, continue to [Configure P2S VPN Gateway for Microsoft Entra ID authentication ΓÇô Microsoft-registered app](point-to-site-entra-gateway.md). + +## <a name="change"></a>Update to Microsoft-registered VPN app Client ID + +> [!NOTE] +> These steps can be used for any of the supported values associated with the Azure VPN Client app. We recommend that you associate the Microsoft-registered App ID Azure Public audience value `c632b3df-fb67-4d84-bdcf-b95ad541b5c8` to your custom app when possible. [!INCLUDE [Change custom audience](../../includes/vpn-gateway-custom-audience-change.md)]