JDataFrame API: Convert Your Java Data Model Into An R Data Frame In Minutes!
A few years ago I was working on developing a software solution that required data in Java to be converted into a data frame (data.frame) in the R Project for Statistical Computing. The data model for this project was sufficiently involved and I wanted to be able to convert this data into JSON, in Java, such that the resultant JSON could be converted into a data frame in the R Project for Statistical Computing using the RJSONIO package. The JDataFrame API solves this pain point by allowing the user, for example, Data Engineers or Data Scientists, to map the existing data model in Java to an instance of JDataFrame, which can then be exported as JSON.
JDataFrame API for Data Engineers and Data Scientists: A Simple One Minute Example
Here we’ll review what’s required to use the JDataFrame API by looking at a simple three-step working example in Java; a fourth step is also included which requires R script and the RJSONIO package.
Step One: Create an instance of the JDataFrameBuilder
In the first step we create an instance of the JDataFrameBuilder class — this is simple enough and the key should always be of type String and the value should be of type Object, as shown below.
JDataFrameBuilder dataFrameBuilder = new JDataFrameBuilder(
new JDataFrame(),
new RemoteAdapter());
JDataFrame source code is available on Bitbucket
The JDataFrame framework is available on BitBucket and can be built from source at the moment. If you would like a prebuilt Maven dependency, or are experiencing build issues, please send me an email.
Enterprise Data Adapter
The JDataFrame framework also uses the Enterprise Data Adapter framework, which is also available on Bitbucket.
Step Two: Add Columns
In the second step, we’ll add a column which consists of both the header as well as the data.
dataFrameBuilder.addColumn("Some Header", new Object[] {"data point 1", "data point 2"});
Step Three: Invoke the serialize method
In the third step, we convert this data into a String containing JSON which can be deserialized using the RJSONIO package.
String result = (String) dataFrameBuilder.serialize();
Below we can see an example of the JSON returned from the call to the serialize method.
{"Some Header":["data point 1","data point 2"]}
Step Four: Convert the JSON to an R data.frame
In the fourth and final step in this example, we convert this JSON into an R data.frame using the RJSONIO package.
temp <- RJSONIO::fromJSON(json)
tempDF <- as.data.frame(temp)
In the next section, we’ll explore a slightly more complicated example.
JDataFrame API for Data Engineers and Data Scientists: A More Involved Two Minute Example
Here we include an example of the JDataFrame in action. The gist is written in R Script and relies on the Groovy programming language, rJava, and RSONIO. This script populates an instance of JDataFrame with two columns (US state codes and descriptions), converts that data into JavaScript Object Notation (JSON), and then we use the RJSONIO package to convert the resultant JSONÂ into an R data frame.
Groovy is an excellent language to embed in R script since we can get all of the benefits of using Java and without the compilation step required if we were to write this logic in pure Java code.
In the next step, we use the RJSONIO fromJSON function to convert JSON content to R objects, and then we convert the temp result into a new data.frame using the as.data.frame function.
The JDataFrame framework relies on the Google GSON API for serialization.
groovyJars <- list (
"C:/development/projects/rGroovy/groovy.jars/groovy-2.4.5-indy.jar",
"C:/development/projects/rGroovy/groovy.jars/ivy-2.4.0.jar"
)
options(GROOVY_JARS=groovyJars)
library(rGroovy)
rGroovy::Initialize()
script <- paste (
"@Grab(group='com.coherentlogic.r.integration', module='jdataframe-core', version='0.8.5-RELEASE')",
"import com.coherentlogic.r.integration.data.frame.JDataFrameBuilder",
"def codes = ['WV', 'VA'] as String[]",
"def descriptions = ['West Virginia', 'Virginia'] as String[]",
"return new JDataFrameBuilder()",
".addColumn('Code', codes)",
".addColumn('Description', descriptions)",
".toJson()", sep="n")
json <- rGroovy::Evaluate(groovyScript = script)
temp <- RJSONIO::fromJSON(json)
tempDF <- as.data.frame(temp)
tempDF
JDataFrame API for Data Engineers and Data Scientists: Five Minute Example
See the R COBOL Data Integration project (RCOBOLDI) for a more involved example where we convert COBOL CopyBook data into an R data frame — in particular, see the ReadCopyBookAsDataFrame R function (line # 145) as well as the JCopyBookConverter.readCopyBookAsJDataFrameBuilder method (line # 69 – 139).
private JDataFrameBuilder readCopyBookAsJDataFrameBuilder(
AbstractLineReader reader,
LayoutDetail layout,
String font,
IUpdateFieldName updateFldName
) throws IOException {
JDataFrameBuilder result =
new JDataFrameBuilder(
new JDataFrame(),
new RemoteAdapter()
);
AbstractLine line;
RecordDetail rec = layout.getRecord(0);
for (int ctr = 1; ctr < rec.getFieldCount(); ctr++) {
var header = (updateFldName.updateName(rec.getField(ctr).getName()));
result.getDataFrame().addOrReturnExistingColumn(header);
}
int idx;
while ((line = reader.read()) != null) {
idx = line.getPreferredLayoutIdx();
if (0 <= idx) {
for (int ctr = 1; ctr < layout.getRecord(idx).getFieldCount(); ctr++) {
var header = rec.getField(ctr).getName();
var value = line.getFieldValue(idx, ctr);
var formattedValue = (String) null;
if (value != null && !(value.isSpaces() || value.isLowValues() || value.isHighValues()) && value.isFieldPresent())
formattedValue = value.asString();
result
.getDataFrame()
.addOrReturnExistingColumn(header)
.addValues(new String[]{formattedValue});
}
}
}
return result;
}
Below we can see the example output from the R COBOL Data Integration package for the R Project, which uses the JDataFrame API.
JDataFrame API: An Inappropriate Data Engineering Use Case Example
The JDataFrame API is not appropriate for moving very large data payloads from Java to the R Project for Statistical Computing. For data payloads that are substantially large it may behoove the Data Engineer or Data Scientist to load the data into Apache Spark, for example, and then load the data from Spark into the R Project for Statistical Computing.
Open-Source JDataFrame API Conclusion
I hope you find the open source JDataFrame API to be useful and helpful. If you do use it, it would be great to hear where it’s being utilized and how it’s benefited your project, so please leave a comment below.
You must log in to post a comment.