In JMP 17 and earlier, the JSL functions Python Set() and Python Send() created a copy of a data table by exporting the data table as a temporary CSV file and then called a pandas.read_csv( temp_file.csv ). The data was then represented in Python as a pandas DataFrame.
As of JMP 18, the jmp.DataTable Python object is a live reference to the JMP data table rather than a copy. The table can be manipulated and edited from Python with the effects immediately available to the JSL and JMP platforms.
For scripts built upon the expectation that a data table sent to Python will be a pandas DataFrame object, the script dt2pandas.py is available in JMP’s Sample Scripts directory. This script makes use of the jmp package’s run_jsl() function and pandas to provide a to_pandas( data_table ) function, which replicates the behavior of writing a temporary CSV file to disk and importing it with pandas.read_csv().
The dt2pandas.py script can be copied into JMP’s site-packages directory and used as if it were an installed package. The dt2pandas module makes the following JSL possible when dt2pandas.py is in the same directory as the JSL script’s file.
Names Default to Here(1);
dt = open("$SAMPLE_DATA/Big Class.jmp");
Python Send(dt);
Python Submit("\[
import dt2pandas as j2pd
df = j2pd.to_pandas(dt);
print(df.head)
]\");
A Pandas DataFrame can also be built by looping across the columns of a data table. This occurs in memory instead of copying the data to the file system, preventing the loss of precision as binary data is converted to text and back again. This also prevents the dramatic increase in the size of the data which occurs when writing binary to text.
The following example creates a Pandas DataFrame from a JMP data table object by looping through the data table’s columns to create equivalent DataFrame columns.
This script is also available by opening the JMP2pandas.py script in JMP’s Sample Scripts directory.
import jmp
import numpy as np
import pandas as pd
# Create a Pandas DataFrame from a JMP DataTable directly
dt = jmp.open(jmp.SAMPLE_DATA + "Big Class.jmp")
df = pd.DataFrame()
for idx in range( len(dt) ):
print(dt[idx].name)
if dt[idx].dtype == jmp.DataType.Numeric:
# Create a numeric column directly using Python's Buffer Protocol
col = np.array( dt[idx] )
df[ dt[idx].name ] = col.tolist() # Make it a list for Pandas.
elif dt[idx].dtype == jmp.DataType.Character:
# Create a list by iterating through values
col = list()
for i in range ( dt.nrows ):
col.append(dt[idx][i])
# Build the character column from list
df[ dt[idx].name ] = col
else:
print( dt[idx].dtype )