README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# quanteda.llm

<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/quanteda.llm)](https://CRAN.R-project.org/package=quanteda.llm)
[![R-CMD-check](https://github.com/quanteda/quanteda.llm/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/quanteda/quanteda.llm/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The **quanteda.llm** package makes it easy to use LLMs with quanteda.corpora (or data frames), to enable classification, summarisation, scoring, and analysis of documents and text. **quanteda** provides a host of convenient functions for managing, manipulating, and describing corpora as well as linking their document variables and metadata to these documents.  **quanteda.llm** makes it convenient to link these to LLMs for analysing or classifying these texts, creating new variables from what is created by the LLMs. Using a tidy approach and linking to the new **quanteda.tidy** package, we enable convenient operations using common Tidyverse functions for manipulating LLM-created obejcts and variables.

# Included functions

The package includes the following functions:

- `ai_summarize`: Summarizes documents in a corpus.
- `ai_label`: Labels documents in a corpus according to a content analysis scheme.
- `ai_score`: Scores documents in a corpus according to a defined scale.

More to follow.

# Supported LLMs

The package supports all LLMs currently available with the `ellmer` package, including:

- Anthropic’s Claude: `chat_claude`.
- AWS Bedrock: `chat_bedrock`.
- Azure OpenAI: `chat_azure`.
- Databricks: `chat_databricks`.
- DeepSeek: `chat_deepseek`.
- GitHub model marketplace: `chat_github`.
- Google Gemini: `chat_gemini`.
- Groq: `chat_groq`.
- Ollama: `chat_ollama`.
- OpenAI: `chat_openai`.
- OpenRouter: `chat_openrouter`.
- perplexity.ai: `chat_perplexity`.
- Snowflake Cortex: `chat_snowflake` and `chat_cortex_analyst`.
- VLLM: `chat_vllm`.

For authentication and usage of each of these LLMs, please refer to the respective `ellmer` documentation [here](https://ellmer.tidyverse.org/reference/index.html). **For example,** to use the `chat_ollama` models, first download and install [Ollama](https://ollama.com/). Then install some models either from the command line (e.g. with ollama pull llama3.1) or within R using the `rollama` package. The Ollama app must be running for the models to be used. To use the `chat_openai` models, you would need to sign up for an API key from OpenAI which you can save in your `.Renviron` file as `OPENAI_API_KEY`.

## Installation

You can install the development version of quanteda.llm from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("quanteda/quanteda.llm")
```

## Examples

### Using `ai_summarize`
```{.r}
library(quanteda)
library(quanteda.llm)
library(quanteda.tidy)
corpus <- quanteda::data_corpus_inaugural %>%
  quanteda.tidy::mutate(llm_sum = ai_summarize(text, chat_fn = chat_ollama, model = "llama3.2"))
# llm_sum is created as a new docvar in the corpus
```

### Using `ai_label`
```{.r}
library(quanteda)
library(quanteda.llm)
library(quanteda.tidy)
label = "Label the following document based on how much it aligns with the political left, center, or right. 
         The political left is defined as groups which advocate for social equality, government intervention in the economy, and progressive policies.
         The political center typically supports a balance between progressive and conservative views, favoring moderate policies and compromise. 
         The political right generally advocates for individualism,                  
         free-market capitalism, and traditional values."
corpus <- quanteda::data_corpus_inaugural %>%
  quanteda.tidy::mutate(llm_label = ai_label(text, chat_fn = chat_ollama, model = "llama3.2", label = label))
# llm_label is created as a new docvar in the corpus
```

### Using `ai_score`
```{.r}
library(quanteda)
library(quanteda.llm)
library(quanteda.tidy)
scale = "Score the following document on a scale of how much it aligns
         with the political left. The political left is defined as groups which 
         advocate for social equality, government intervention in the economy, 
         and progressive policies. Use the following metrics: 
         SCORING METRIC:
         1 : extremely left
         0 : not at all left"
corpus <- quanteda::data_corpus_inaugural %>%
  quanteda.tidy::mutate(llm_score = ai_score(text, chat_fn = chat_ollama, model = "llama3.2", scale = scale))
# llm_score is created as a new docvar in the corpus
```