The Model of Agricultural Production and Its Impact on the Environment (MAgPIE) is a global land-use modeling framework that simulates agricultural production, land-use change, and environmental impacts under various socio-economic and climate scenarios (Climate Impact Research (PIK), 2025). In our work, MAgPIE is used as a modeling tool to analyze key variables related to the Brazilian territory and its international interactions, including deforestation, harvested area in agriculture, crop production, cattle herd dynamics, and trade.
Within the MAgPIE settings, spatial variables can be represented at
multiple resolutions—ranging from global and regional levels to
cluster-based and cellular scales, these with increasing spatial
resolution. In the default configuration of MAgPIE version
4.10.0, which is the version adopted in our study, the world is
divided into 12 regions, with Brazil incorporated into the
LAM region, which encompasses Latin American countries.
Within the LAM region, MAgPIE defines 26 spatial
clusters, which are aggregations of cells grouped according to multiple
criteria, such as potential crop productivity, land availability,
climatic conditions, etc. Importantly, these clusters are not
necessarily spatially contiguous, meaning that a single cluster may
contain cells spread across multiple countries. This default spatial
configuration directly influences the MAgPIE outputs we are analyzing,
introducing limitations for representing the specificities of the
Brazilian context. Since the cluster definitions do not align with
national boundaries, and certain variables—such as trade—are modeled
exclusively at the regional level, the reliability of the analyses for
Brazil is significantly reduced.
In light of these limitations, we carried out a dedicated process to
modify the default MAgPIE configuration in order to define Brazil as a
distinct and isolated region. In this new configuration, the clusters
within the BRA region correspond directly to the 27
Brazilian states. The purpose of this report is to document the steps
involved in creating this revised spatial structure, which includes
modifications to the core mapping files and the preprocessing of
relevant input data.
This report describes the input data files, details the processing steps performed, and explains the data preparation procedures for running MAgPIE considering a new region corresponding to Brazil. The objective is to guarantee transparency and reproducibility in data processing, as well as to tailor the input datasets to the scope of this study, which centers on analyses related to the national context of Brazil.
In the default configuration of MAgPIE, the global land surface is discretized into spatial grid cells at a \(0.5^\circ\) resolution, which are subsequently aggregated into 200 clusters to ensure computational efficiency while preserving regional heterogeneity. These clusters are organized into 12 world regions, each containing a specific number of clusters according to its geographic extent and socioeconomic relevance. This hierarchical structure—grid cell -> cluster -> region—provides the spatial framework for land-use allocation, production modeling, and policy analysis within MAgPIE. The overall distribution of clusters across regions is depicted in Figure 1, highlighting the spatial aggregation scheme adopted in the default setup.
library(dplyr)
library(ggplot2)
library(ggspatial)
df<-readRDS("clustermap_rev4.117_c200_67420_h12.rds")
coords <- strsplit(df$cell, "\\.")
mat <- do.call(rbind, coords)
df$lon <- mat[,1]
df$lat <- mat[,2]
df$iso <- mat[,3]
df$lon <- as.numeric(gsub("p", ".", df$lon))
df$lat <- as.numeric(gsub("p", ".", df$lat))
df_plot <- df %>%
mutate(cluster3 = substr(cluster, 1, 3))
valores_unicos <- unique(df_plot$cluster)
resultado <- tibble(valor = valores_unicos) %>%
mutate(regiao = substr(valor, 1, 3)) %>%
count(regiao, name = "n") %>%
mutate(regiao_final = paste0(regiao, " (", n, ")"))
df_final <- df_plot %>%
left_join(resultado, by = c("cluster3" = "regiao"))
cores_custom <- c(
#"#A8A8A8",
"#ED9659", "#3CB44B", "#FDD61C", "#898916",
"#FF9999", "#9DCFC9", "#4363D8", "#43D4F4", "#820505",
"#9A6425", "#911FB4", "#E52654"
)
ggplot(df_final, aes(x = lon, y = lat, fill = regiao_final)) +
geom_tile() +
coord_equal() +
scale_fill_manual(values = cores_custom,
guide = guide_legend(nrow = 2,
title.position = "top" )) +
theme_minimal() +
labs(fill = "Region (number of cluster)") +
theme(
legend.position = "bottom",
legend.box = "horizontal",
legend.title.align = 0
) Figure 1: MAgPIE world regions and cluster settings (Default version).
Focusing more specifically on the LAM region, which
comprises the countries of Latin America—including Brazil—this region is
represented in the default MAgPIE setup by 26 clusters. The spatial
distribution of these clusters, along with their constituent grid cells,
is illustrated in Figure 2, where each cluster is depicted in a distinct
color for visualization purposes. It is important to emphasize that grid
cells assigned to the same cluster are not required to be geographically
contiguous. As a result, a single cluster may group together areas that
share similar production conditions or land-use characteristics, even if
they are spatially dispersed across different countries of the region.
Of the 26 clusters in the LAM region, numbered 59 to 84, 18
contain at least one grid cell within Brazilian territory.
library(raster)
library(sp)
df<-readRDS("clustermap_rev4.117_c200_67420_h12.rds")
LAM<-subset(df, region=='LAM')
coords <- strsplit(LAM$cell, "\\.")
mat <- do.call(rbind, coords)
LAM$lon <- mat[,1]
LAM$lat <- mat[,2]
LAM$iso <- mat[,3]
LAM$lon <- as.numeric(gsub("p", ".", LAM$lon))
LAM$lat <- as.numeric(gsub("p", ".", LAM$lat))
valores_unicos <- unique(LAM$cluster)
r <- rasterFromXYZ(cbind(LAM[,c("lon","lat")], z=1),
res = c(min(diff(sort(unique(LAM$lon)))),
min(diff(sort(unique(LAM$lat))))))
polys <- rasterToPolygons(r, dissolve = FALSE)
pts_sp <- SpatialPointsDataFrame(
coords = LAM[, c("lon","lat")],
data = LAM["cluster", drop = FALSE],
proj4string = CRS(proj4string(polys)))
polys$cluster <- over(polys, pts_sp)$cluster
library(Polychrome)
pal <- createPalette(26, c("#ff0000", "#00ff00", "#0000ff"))
cols <- pal[ match(polys$cluster, valores_unicos) ]
plot(polys,
col = cols,
border = "grey80",
lwd = 0.5)
legend("bottomleft", inset = c(0.05, 0),
legend = valores_unicos,
fill = pal,
ncol = 3,
cex = 0.6,
pt.cex = 0.6
) Figure 2: Spatial grid cells in the LAM region, aggregated into clusters according to the MAgPIE default configuration.
To achieve our objective of analyzing MAgPIE results specifically for the Brazilian territory, we redefined the default configuration of the model of regions and clusters. This required modifications to the mapping files responsible for defining these classifications.
The file defines the correspondence between countries and their respective regions. It consists of three data columns: the first contains the full country names, the second the corresponding country codes, and the third the associated regional codes. To create the new region, a single change was made to this file, in the third column, corresponding to the region of Brazil, which now has the code BRA.
Cluster redefinition is primarily implemented through changes to the file, which maps each grid cell to its corresponding region, country, and cluster. In the default configuration, cluster allocation follows a global optimization procedure based on multiple criteria, including potential agricultural productivity, land and water availability, land-use patterns, climatic conditions, population density, and food demand. Although this approach is suitable for global modeling, it does not adequately capture the heterogeneity and regional characteristics of the Brazilian environment.
Therefore, we introduced a new clustering scheme in which the
Brazilian territory is subdivided into 27 clusters, each corresponding
to one of the Brazilian states. With the separation of Brazil from the
LAM region, two clusters of the standard configuration no longer exist,
LAM.67 and LAM.70, as all of their cells
referred to Brazilian territory. Consequently, the numbering of the
clusters was changed in order to make the numbering continuous. Figure 3
shows the new configuration considered.
library(dplyr)
library(ggplot2)
library(ggspatial)
df<-readRDS("clustermap_rev4.117_c225_67420_h13.rds")
coords <- strsplit(df$cell, "\\.")
mat <- do.call(rbind, coords)
df$lon <- mat[,1]
df$lat <- mat[,2]
df$iso <- mat[,3]
df$lon <- as.numeric(gsub("p", ".", df$lon))
df$lat <- as.numeric(gsub("p", ".", df$lat))
df_plot <- df %>%
mutate(cluster3 = substr(cluster, 1, 3))
valores_unicos <- unique(df_plot$cluster)
resultado <- tibble(valor = valores_unicos) %>%
mutate(regiao = substr(valor, 1, 3)) %>%
count(regiao, name = "n") %>%
mutate(regiao_final = paste0(regiao, " (", n, ")"))
df_final <- df_plot %>%
left_join(resultado, by = c("cluster3" = "regiao"))
#3 primeiras letras
cores_custom <- c(
"#A8A8A8",
"#ED9659", "#3CB44B", "#FDD61C", "#898916",
"#FF9999", "#9DCFC9", "#4363D8", "#43D4F4", "#820505",
"#9A6425", "#911FB4", "#E52654"
)
ggplot(df_final, aes(x = lon, y = lat, fill = regiao_final)) +
geom_tile() +
coord_equal() +
scale_fill_manual(values = cores_custom,
guide = guide_legend(nrow = 2,
title.position = "top" )) +
theme_minimal() +
labs(fill = "Region (number of cluster)") +
theme(
legend.position = "bottom",
legend.box = "horizontal",
legend.title.align = 0
) Figure 3: MAgPIE new world regions and cluster settings (Brazil version).
By removing Brazilian cells from these clusters and reallocating them to the newly defined BRA clusters, the original structure of the LAM region is modified, as illustrated in Figure 4.
library(raster)
library(sp)
map<-readRDS("clustermap_rev4.117_c225_67420_h13.rds")
LAM<-subset(map, region=='LAM')
coords <- strsplit(LAM$cell, "\\.")
mat <- do.call(rbind, coords)
LAM$lon <- mat[,1]
LAM$lat <- mat[,2]
LAM$iso <- mat[,3]
LAM$lon <- as.numeric(gsub("p", ".", LAM$lon))
LAM$lat <- as.numeric(gsub("p", ".", LAM$lat))
valores_unicos <- unique(LAM$cluster)
r <- rasterFromXYZ(cbind(LAM[,c("lon","lat")], z=1),
res = c(min(diff(sort(unique(LAM$lon)))),
min(diff(sort(unique(LAM$lat))))))
polys <- rasterToPolygons(r, dissolve = FALSE)
pts_sp <- SpatialPointsDataFrame(
coords = LAM[, c("lon","lat")],
data = LAM["cluster", drop = FALSE],
proj4string = CRS(proj4string(polys)))
polys$cluster <- over(polys, pts_sp)$cluster
library(Polychrome)
pal <- createPalette(26, c("#ff0000", "#00ff00", "#0000ff")) #
cols <- pal[ match(polys$cluster, valores_unicos)]
plot(polys,
col = cols,
border = "grey80",
lwd = 0.5)
legend("bottomleft", inset = c(0.05, 0),
legend = valores_unicos,
fill = pal,
ncol = 3,
cex = 0.6,
pt.cex = 0.6
) Figure 4: Spatial grid cells in the LAM region, aggregated into clusters according to the new settings.
The new clusters are named sequentially from BRA.199 to
BRA.225. This modification allows the model to reflect more
accurately the socio-environmental diversity present across Brazil and
enables more detailed spatial analyses.
Moreover, unlike the default clustering, the new configuration imposes a spatial-contiguity constraint: all cells within a Brazilian cluster must be geographically adjacent. This ensures that each cluster represents a continuous geographic region, improving the interpretability of spatial patterns and reducing distortions related to non-contiguous cluster assignments. It is important to emphasize that the current cluster configuration—corresponding to the Brazilian states—represents a preliminary setup intended solely to test the model’s adaptability to a new spatial configuration. In future developments, we aim to enhance the spatial resolution by treating each cluster as a unique cell within Brazil, thereby allowing for fully disaggregated, cell-level modeling across the national territory. Figures 5 illustrates the new spatial configuration adopted.
library(raster)
library(sp)
map<-readRDS("clustermap_rev4.117_c225_67420_h13.rds")
BRA<-subset(map, region=='BRA')
coords <- strsplit(BRA$cell, "\\.")
mat <- do.call(rbind, coords)
BRA$lon <- mat[,1]
BRA$lat <- mat[,2]
BRA$iso <- mat[,3]
BRA$lon <- as.numeric(gsub("p", ".", BRA$lon))
BRA$lat <- as.numeric(gsub("p", ".", BRA$lat))
valores_unicos <- unique(BRA$cluster)
r <- rasterFromXYZ(cbind(BRA[,c("lon","lat")], z=1),
res = c(min(diff(sort(unique(BRA$lon)))),
min(diff(sort(unique(BRA$lat))))))
polys <- rasterToPolygons(r, dissolve = FALSE)
pts_sp <- SpatialPointsDataFrame(
coords = BRA[, c("lon","lat")],
data = BRA["cluster", drop = FALSE],
proj4string = CRS(proj4string(polys)) # usa o mesmo CRS de 'polys'
)
polys$cluster <- over(polys, pts_sp)$cluster
library(Polychrome)
pal <- createPalette(27, c("#ff0000", "#00ff00", "#0000ff")) # cores
cols <- pal[ match(polys$cluster, valores_unicos) ]
myplot<-
plot(polys,
col = cols,
border = "grey80",
lwd = 0.5)#,
#add = TRUE)
legend("bottomleft", inset = c(0.05, 0),
legend = valores_unicos,
fill = pal,
ncol = 3,
cex = 0.6,
pt.cex = 0.6
) Figure 5: Spatial grid cells in the BRA region, aggregated into clusters according to the new settings.
This chapter presents the input data files required for model execution and describes the data processing steps performed to prepare these files for model execution. The main goal was to reprocess the data considering the newly defined BRA region, ensuring that all inputs were consistent with the requirements of the model.
The input files used for running the model are available in the public PIK repository (Climate Impact Research (PIK), 2018), organized into five input data bundles. Each bundle was thoroughly reviewed to ensure that all datasets were adapted to the new regional configuration. The following sections present all processed input bundles along with their corresponding details.
The preprocessing procedures involved the adaptation of existing routines and the development of new ones when necessary. Several challenges emerged during this process, mainly due to the limited availability of the original scripts used by the model developers. As a result, additional adjustments and manual harmonization steps were required.
The data processing phase revealed a series of technical issues related to data structure, metadata consistency, and intermediate scripts. This section documents these challenges and provides context on their origin.
The FAOSTAT datasets (Food & United Nations (FAO), 2025) used to compute the model inputs have recently undergone a structural reorganization. As a result, the automated download functions can no longer be executed successfully, as they consistently return the following error:
Error in download.file(faoMeta$FileLocation, destfile = destfile, mode = "wb"): invalid url argumentThe access keys for FAO datasets available through the Bulk Download link were modified following a structural reorganization of the platform. As a result, the data download process has become more effort-intensive, since it is now necessary to identify the corresponding database for each dataset and, in some cases, adapt it to the previous structure to ensure compatibility with existing reading functions. Consequently, a comprehensive adaptation of all FAO datasets used by MAgPIE is currently underway to fully reproduce the processing pipeline. Below is a list of the datasets previously used and their corresponding replacements in the new workflow.
Even after these adjustments, the FAOSTAT databases could not be automatically read and processed by the corresponding functions. Consequently, some files were modified and will be examined in greater detail at a later stage to identify and implement potential corrections.
A recurring issue involved datasets derived from scientific publications, which are not always provided in a structured, ready-to-use format, nor made available for automated access via download links. In such cases, no dedicated download functions exist, and the following error was encountered:
ERROR: Sourcefolder does not contain data for the requested source type = type subtype = subtype and there is no download script which could provide the missing data. Please check your settings!In these cases, it was necessary to manually locate the required datasets and apply the appropriate adjustments so that the data-reading function could be used.
This section presents the main datasets used as input for the model.
The preprocessing procedures were conducted in the R environment, employing the packages and functions recommended by the model developers. Among these, the madrat package (version 3.24.1) and its associated dependencies played a central role. This package, specifically designed for the preprocessing of input data used within the MAgPIE modeling framework, was essential to ensure consistency and reproducibility in data preparation.
One of the most frequently employed functions in this process was
calcOutput() from the madrat package. This function was
developed as a wrapper for specific routines designed to handle the
various types of outputs utilized within the model. When executed with a
specified output type, calcOutput() calls the corresponding
function from one of the auxiliary preprocessing packages and performs
the entire workflow, ranging from downloading the required datasets to
aggregating the data by regions. It is also possible to provide a file
containing country-to-region mappings, which enables the function to
execute the regional aggregation step automatically.
Another widely used function was toolAggregate(), also
from the madrat package. This function performs the aggregation (or
disaggregation) of a dataset according to a relation matrix or mapping.
In addition, it allows for the inclusion of weights, which are applied
in the calculation of the final aggregated values.
This bundle contains files processed at all levels, including: cell, country, global, regional, and cluster. Files at the cell, country, and global levels do not require additional processing.
This bundle contains the largest number of files, most of which are already processed at the regional level. However, it also includes datasets at the global and country levels, which do not require additional processing.
This bundle contains a validation file that was not used in this processing step.
This bundle contains global-level files; therefore, they were not processed again. The original files were used.
This bundle contains regional-level calibration files. These files still need to be studied in greater detail to refine the process; however, for the new BRA region, the same values as those for the LAM region were initially used.
The process of creating new clusters within a newly defined region in the MAgPIE configuration involved both straightforward and complex steps. Given that input data preparation is a crucial step when modifying MAgPIE’s spatial structure, the preprocessing phase plays a key role in refining the resolution and reliability of model outputs for Brazil. While initial tasks—such as acquiring input data and modifying mapping files—were relatively simple, the workflow also required extensive, time-consuming efforts to identify and resolve errors that emerged during the generation of new input files based on the updated spatial configuration.
For the next phases, once all input data has been properly prepared, we aim to replicate the analyses performed using MAgPIE’s original spatial structure. The objective is to compare the key output variables—such as harvested crop areas, cattle herd production, and deforestation—between the newly customized spatial configuration and the original setup. Once more, we will benchmark our results against official Brazilian public datasets to validate consistency and enhance reliability. Additionally, by defining Brazil as a standalone region, we will be able to explicitly evaluate trade flows between Brazil and other global regions, enabling more accurate analyses of international trade dynamics involving Brazilian agricultural and land-use sectors.