Ada.ia
Logo Base de Conhecimento

Data Partitioning in Structure

What is Data Partitioning

Data partitioning is a technique used to divide large volumes of information into smaller, more manageable parts called partitions.
In the context of BIMachine, this configuration is applied directly to structures (data models that connect to the database), allowing queries to be executed more efficiently.

In simple terms, partitioning works like “separating data into blocks,” which helps BIMachine and the source database to retrieve only the information that is really necessary during an analysis.

Why Partitioning is Important

When working with large databases (with millions of records, for example), query response times tend to increase.
Partitioning helps reduce this time because it limits the amount of data read during each execution.

Key benefits:

  • Better query performance: the system only accesses relevant partitions.
  • More efficient use of resources: less load on the database and BIMachine.
  • Better data organization: facilitates understanding and control of large volumes of information.

Where to configure partitioning in BIMachine

Partitioning is defined when configuring the structure within the BIMachine Data Modeler.

How partitioning works in BIMachine

When configuring a structure in BIMachine, you can define a partitioning field related to time periods, such as year or month.

Thus, each partition represents a subset of data based on that field. For example:

  • A partition by year will create divisions such as 2022, 2023, 2024, etc.
  • A partition by month will create divisions such as 2024-01, 2024-02, 2024-03, and so on.

This way, when the user filters or analyzes information in a specific period, BIMachine queries only the necessary partitions in the database, optimizing performance.

How to Define the Best Type of Partitioning

The ideal partitioning depends on how the data is queried and how it is stored in the database.

Some examples and best practices:

ScenarioRecommended PartitioningJustification
Analyses are usually filtered by year (e.g., comparing 2023 vs. 2024)By yearReduces read volume, as each query accesses only one or a few years.
There is a large volume of data and queries are monthly (e.g., monthly billing reports)By monthImproves performance in very large databases and more detailed queries.

Best Practices and Tips

  • Observe the filters most commonly used in analyses. Partitioning should reflect the pattern of data usage.
  • Avoid over-partitioning. Too many small partitions can generate overhead and not bring any real gains.
  • Review the configuration periodically. As data volume and usage change, it may be necessary to adjust the strategy.
  • Maintain consistency with the source database. If the database already has internal partitioning, BIMachine should ideally follow the same logic.

Conclusion

Partitioning data into structures is a feature that can improve the query performance of analyses and indicators in BIMachine, especially in environments with large volumes of information.
By dividing the data strategically, usually by year or month, analyses become faster and more efficient.

However, there is no single ideal model.
It is important for each user to assess how their data is queried and define the type of partitioning that best suits their usage reality.

For example, if analyses are often heavily filtered by year, it is recommended to partition by year.
In cases where the database has a large volume of data from different periods, partitioning by month may yield better results.

⚠️ Important: if the total volume of data in the structure is less than 1 million rows, partitioning is not recommended.
In these situations, the performance gain tends to be minimal, and partitioning can even generate unnecessary processing in the database.

Latest Articles

Scroll to Top