
Conducting multi-regional research? HDRN Canada can help!
Distributed Analysis is a class of methods for conducting multi-regional data analysis without pooling data in a single location. How does it work? Data are first harmonized, creating a single dataset that is partitioned across Trusted Research Environments (TRE). Analyses are run in each TRE, and summary results are shared with a coordinating centre. The coordinating node integrates information from all data centres to generate the final analysis. The final results can then be accessed by the researcher. Interested? Contact our Data Access Support Hub/DASH team: dash@hdrn.ca. Download our Distributed Analysis Factsheet.
Distributed Analysis Resources


Federated Analysis: State of the Science Collective Learning Series
In 2024, HDRN Canada launched Federated Analysis: State of the Science Collective Learning Series, a limited webinar series that convened leading thinkers and experts to explore and share current knowledge about federated analytics, a particular type of distributed analysis. The series covered a broad range of topics, including examining the benefits and limitations of various distributed analysis approaches, demonstrating practical examples of statistical analyses with distributed data, describing Trusted Research Environments across Canada and AI analytics in a federated landscape. View and share the recordings!
Frequently Asked Questions
Statistical analysis of a distributed data source where intermediate statistical parameters are exchanged between data nodes and the coordination node. No line-level data is ever exchanged for a distributed analysis of a distributed data source.
In a distributed analysis, statistical parameters are shared between trusted environments and then integrated to produce the result. While the process of distributed analysis is different from pooled analysis, the results are essentially identical (to the second decimal) in the methods we currently support. The code for distributed analysis is now available as an open-source package on the web: GitHub Distributed Analysis Resources.
A distributed analysis enables the inclusion of data that cannot be moved, or pooled, because of legal, ethical, policy and/or social acceptability restrictions.
They are abstract mathematical quantities like matrices. For more information on what is being exchanged, please see GitHub Distributed Analysis Resources: Logistic regression as an example for the logistic regression.
The parameters can be expressed in different ways. To enhance interoperability between various statistical analysis tools, they are currently exchanged via text files (CSV files). These files can be exchanged in various ways like email, One Drive, custom platforms, etc. Currently, HDRN Canada uses a web-based platform (PARS3) to coordinate the exchange the parameters
The same principles of quality apply to a distributed analysis as for pooled data. For example, the distributed data sources should have the same information model. Poor quality data and model will affect the distributed analysis in the same way it would affect a pooled analysis.
A federated analysis is a subtype of distributed analysis. A federated analysis has more technical requirements than a distributed analysis. For example, the software packages (e.g., R, Stata, SPSS) used to run the analysis across distributed data sources must be the same and the data partition at the data node needs to represent the information using the same technology and data types (i.e., one data centre cannot use MSSQL and another Oracle for a federated analysis). Distributed analysis does not have as stringent technical requirements to be used across data partitions.
The requirements are very similar to a pooled analysis. At minimum, the chosen analytical methods and models must be coherent, the data source chosen must conform to the requirements of the chosen model (e.g., in terms of model assumptions, missing data). The predictors to be included must be selected and this can be done by exploratory analyses at one of the nodes (i.e., HDRN Canada data centre).
The main requirement is to have a research question that can be answered using one of the supported analytical methods (currently linear, logistic and Cox regression are supported, with and without the use of weights) (see GitHub Distributed Analysis Resources). If the method required is not currently available, please contact dash@hdrn.ca as we are expanding the library of available methods. All the usual data governance and secure environment policies continue to apply.
Currently, distributed analysis is supported for linear regression, logistic regression and Cox regression analysis (see GitHub Distributed Analysis Resources). If the method required is not currently available, please contact dash@hdrn.ca as we are expanding the library of available methods.
The Data Access Support Hub (DASH) at HDRN Canada is a coordination service offered by 14 data centres across Canada. DASH provides facilitation support for researchers doing multi-regional research in Canada. Upon submitting your project to DASH for review, you can receive a free feasibility assessment and estimation of costs for your request. Once you confirm funding and that you would like to proceed with a formal data access request, DASH works with your team to coordinate the necessary approvals, agreements and other requirements to access the data in accordance with local policy and legislation. Finally, once the necessary approvals and agreements are in place, and your project’s data and analytical plan is available, HDRN Canada data centres can complete the distributed analysis of the data (where services are offered). For more information, please contact dash@hdrn.ca.
Upon submitting your project to DASH for review, the coordination services up to and including receiving a project feasibility assessment and cost estimate are free. Once a research team confirms that they would like to submit a formal data access request, local data centres respond to these requests on a cost-recovery basis. While there are a range of costs associated with health dataset preparation, analytic and other services provided by HDRN Canada data centres, the decision to perform a distributed analysis has a negligeable impact on the costs.
Contact dash@hdrn.ca and we will connect you to the right place!
Please see the following online resources: