Customer Story

Automation takes data query directly to the field

An innovative JMP® add-in helps North Dakota State University’s Breeding Pipeline accelerate development through rapid data query and statistical analysis.

North Dakota State University

ChallengeAccelerate query and analysis processes needed by plant breeders to assess and compare crop varieties as data is collected, rather than after weeks of manual work fraught with the potential for mistakes and missteps. 
SolutionTo better manage accelerating flows of phenotypic, genomic, drone and  sensor data, data and pipeline managers in the Department of Plant Sciences at North Dakota State University sought a customizable statistical tool capable of extracting and running mixed model analysis on unbalanced data from trials across a range of locations and time periods. They selected JMP and used it to create the Ag Query Hub, or Ag.Q.Hub, an interactive add-in developed in JMP Scripting Language.
ResultsAg.Q.Hub streamlines computational components of crop breeding work flows by guiding users through template-driven SQL queries and mixed model analysis of field data. It’s a multifaceted new tool that breeders say dramatically accelerates selection processes through automation of standardized tasks, guiding researchers in a user-friendly graphical interface and offering built-in analysis functionality. 

As impacts from climate change and population growth increase, governments, markets and crop breeders race to upgrade and improve performance and production across agricultural sectors. At breeding programs around the world, scientists are working to develop crop varieties that can be produced in larger quantities, crossed with more consistency and bred to be more resistant to environmental stress. One such program is based at North Dakota State University (NDSU), which operates one of the largest public plant-breeding programs in the United States.

NDSU has forged strong academic-industrial partnerships and pioneered new uses for agricultural systems technology in a field that has historically relied on a combination of tradition and intuition. Recognizing the potential for automation and more sophisticated data capture to improve scientific outcomes, the NDSU Breeding Pipeline team employs cross-functional database management strategies to assure high-quality data collection and migration that is conducted in compliance with protocols built to enable successful technology development and transference. The team supports 10 NDSU-based breeding programs working with a dozen crop species, as well as research and extension centers located across North Dakota.

“One of our key functions is the development of new commercial crop products,” says NDSU Breeding Pipeline Database Manager Tom Walk, PhD. “And the effectiveness of those efforts depends on rapid identification of the genetic potential of tested genotypes.”

Recognizing the power of automation to accelerate analysis, Walk and his colleagues in the Breeding Pipeline Database Management team have developed a sophisticated dashboard known as Ag Query Hub, or Ag.Q.Hub. The system, unlike even the best prefab analytics tools on the market, allows users to extract and run mixed model analysis on unbalanced data from multi-environment trials – and do so across years of data taken from different locations.

This wide-ranging functionality is possible, Walk says, because Ag.Q.Hub was built in JMP®.

With new sensing technologies come vast quantities of data

Breeding pipelines offer crop scientists the opportunity to select for desired traits from an extensive database of phenotypic data spanning a multitude of varieties, grown at multiple locations over many years. “Plant breeding generates a lot of data,” Walk explains. “You don’t just get lucky and pick the right one. You have to go through thousands of test varieties to find one or two that will be worth releasing.”

Take, for example, a breeder looking to select barley varieties for small-scale brewers. “There are several attributes to look at from an agronomic standpoint, and several genetic traits from a quality standpoint,” explains Database Manager Ana María Heilman-Morales, PhD. Just 10 or 15 years ago, the information needed to make these decisions would likely have been collected by hand in field notebooks; the industry has since made significant strides by embracing technological innovation.

Case in point: automation. The advent of automated sensing technologies in crop science, in addition to the use of drones and cameras, has resulted in a flood of new data. Breeding databases have grown exponentially, and plant breeders have had to adjust their processes to accommodate a substantially more data-rich environment.

As an educational institution, NDSU has a mandate to keep pace with its commercial counterparts. “As a university, we have the added challenge of training researchers and students for the jobs of today and tomorrow, and we have to stay relevant in the technology we use,” Walk says. Learning to work with digital systems is a key skill for NDSU’s crop science students.  

With fewer resources than its private-sector counterparts, however, it has long been clear that NDSU’s Breeding Pipeline required a new data management architecture that could optimize data insight without adding extra time or cost to the process. “We needed a way for users to access their data quickly, including all data from the past up to data that they've just collected in the last day or the last few hours,” Walk explains. Furthermore, they needed a better way to streamline analysis, Heilman-Morales adds: “We wanted to build a robust type of analysis using mixed-linear-model analysis or spatial analysis…. That's when Ag.Q.Hub was born.”

From days and weeks to minutes and hours

Ag.Q.Hub is a proprietary automated system that integrates data retrieval and analysis for the rapid identification of the genetic potential of tested genotypes. It was developed in JMP by Walk and Heilman-Morales, who used JMP Scripting Language (JSL) to create a customized solution. “The idea is to stay relevant,” says Heilman-Morales. And Ag.Q.Hub does that and more.

The Ag.Q.Hub tool queries Microsoft SQL Server databases built in AGROBASE, a powerful tool on its own that covers most aspects of plant breeding data operations from managing stocks to experimental design to an array of data analysis choices. Ag.Q.Hub boosts analytical capabilities through an interface that enables users to select search parameters via interactive forms. Returned data can then be run through mixed-model analyses directly in JMP or in ASReml analysis launched from the JMP add-in.

Furthermore, Ag.Q.Hub automatically splits tables to expedite data management. It outputs initial results to tabs containing histograms and univariate statistics for data quality assessment and uses object-oriented design to separate components into JSL functions for modular capabilities. It’s a multipurpose tool, unique in its ability to extract and run mixed-model analysis on unbalanced data from multienvironment trials. While AGROBASE and its recent successor, Genovix, have many built-in tools to cover a range of analyses, custom-built JMP add-ins such as Ag.Q.Hub provide researchers means to complement and supplement proprietary tools with custom-built analytics pipelines that Walk says off-the-shelf database management tools do not cover.

“Tasks like compiling data over years were practically impossible in the past,” he says. “We’ve cut the time from days or weeks to a few minutes or hours. And that opens up analysis that wasn’t possible before.”

With efficiency and reproducibility, tangible results

The value of JMP and Ag.Q.Hub are not just limited to doing things that weren’t previously possible. There is significant value also in the efficiency and reproducibility the system introduces to the pipeline, Walk and Heilman-Morales say.

Each of NDSU’s 10 breeding programs employ technicians and graduate students, each of whom access and analyze data on a daily basis. By automating common analyses that are repeated regularly by different individuals, Ag.Q.Hub has made standard queries more reproducible and less prone to user error or variability.

Walk also points to the advantages of connecting JMP add-ins and tools directly to the pipeline’s data sources. That can save steps and allow for data organization and visualization that would be difficult if each data source had to be viewed separately.

As a JSL scripted tool, Ag.Q.Hub also offers all the benefits of the dynamic data visualization features of JMP software. “We can color-code [our outputs] to point out outliers,” Walk says. “Users prefer these visualizations because it accelerates decision making. They can see if the distributions fit their assumptions and get an overview of their experimental designs and how they ran their experiments.”


Database Manager Ana María Heilman-Morales (right) demos Ag.Q. Hub at a JMP Discovery Summit.

Seeding the future

The Breeding Pipeline’s success with Ag.Q.Hub has also opened doors for the team to explore other applications for automation. “There are a number of modeling methods in JMP, and we have other algorithms with machine learning and artificial intelligence,” Walk says. “We’ve taken steps to save researchers a lot of time by automating parts of the process.”

“For example, if genomic sequence data is kept in separate flat files, it takes a lot of effort to compile them and put them in a format you can import into JMP Genomics,” he explains. “But if it’s organized from the start to go into a relational database, you can develop standard queries to efficiently access the data.” The team is also testing ways to incorporate JMP into their genomic selection work – a project that may ultimately see the use of Ag.Q.Hub or another custom add-in to support the department’s use of JMP Genomics.

In the meantime, the team is pleased with their progress. “Our model is pretty new for a public university system,” says Heilman-Morales. “Having a team like us and these automated ways of doing things is even newer. Other universities have started to ask us, ‘What are you doing, and what tools are you using?’” It’s a recognition that customized tools like Ag.Q.Hub are indeed seeding the future of crop science.

The results illustrated in this article are specific to the particular situations, business models, data input and computing environments described herein. Each SAS customer’s experience is unique, based on business and technical variables, and all statements must be considered nontypical. Actual savings, results and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty statements in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software.