JDataFrame API: Convert a Java data model into an R data frame in minutes!
A few years ago I was working on developing a software solution that required data in Java to be converted into a data frame (data.frame) in the R Project for Statistical Computing. The data model for this project was sufficiently involved and I wanted to be able to convert this data into JSON, in Java, such that the resultant JSON could be converted into a data frame in the R Project for Statistical Computing using the RJSONIO package. The JDataFrame API solves this pain point by allowing the user, for example, Data Engineers or Data Scientists, to map the existing data model in Java to an instance of JDataFrame, which can then be exported as JSON.
JDataFrame API for Data Engineers and Data Scientists: A Simple One Minute Example
In this pure Java example we create an instance of JDataFrameBuilder, we add a column with a header and some data, and then we call the serialize method which returns JSON.
JDataFrameBuilder dataFrameBuilder = new JDataFrameBuilder(
new JDataFrame(),
new RemoteAdapter());
dataFrameBuilder.addColumn("Some Header", new Object[] {"data point 1", "data point 2"});
String result = (String) dataFrameBuilder.serialize();
Below we can see an example of the JSON returned from the call to the serialize method.
{"Some Header":["data point 1","data point 2"]}
In the next section, we’ll explore a slightly more complicated example.
JDataFrame source code is available on Bitbucket
The JDataFrame framework is available on BitBucket and can be built from source at the moment. If you would like a prebuilt Maven dependency, or are experiencing build issues, please send me an email.
Enterprise Data Adapter
The JDataFrame framework also uses the Enterprise Data Adapter framework, which is also available on Bitbucket.
JDataFrame API for Data Engineers and Data Scientists: A More Involved Two Minute Example
Here we include an example of the JDataFrame in action. The gist is written in R Script and relies on the Groovy programming language, rJava, and RSONIO. This script populates an instance of JDataFrame with two columns (US state codes and descriptions), converts that data into JavaScript Object Notation (JSON), and then we use the RJSONIO package to convert the resultant JSON into an R data frame.
Groovy is an excellent language to embed in R script since we can get all of the benefits of using Java and without the compilation step required if we were to write this logic in pure Java code.
In the next step, we use the RJSONIO fromJSON function to convert JSON content to R objects, and then we convert the temp result into a new data.frame using the as.data.frame function.
The JDataFrame framework relies on the Google GSON API for serialization.
groovyJars <- list (
"C:/development/projects/rGroovy/groovy.jars/groovy-2.4.5-indy.jar",
"C:/development/projects/rGroovy/groovy.jars/ivy-2.4.0.jar"
)
options(GROOVY_JARS=groovyJars)
library(rGroovy)
rGroovy::Initialize()
script <- paste (
"@Grab(group='com.coherentlogic.r.integration', module='jdataframe-core', version='0.8.5-RELEASE')",
"import com.coherentlogic.r.integration.data.frame.JDataFrameBuilder",
"def codes = ['WV', 'VA'] as String[]",
"def descriptions = ['West Virginia', 'Virginia'] as String[]",
"return new JDataFrameBuilder()",
".addColumn('Code', codes)",
".addColumn('Description', descriptions)",
".toJson()", sep="n")
json <- rGroovy::Evaluate(groovyScript = script)
temp <- RJSONIO::fromJSON(json)
tempDF <- as.data.frame(temp)
tempDF
JDataFrame API for Data Engineers and Data Scientists: Five Minute Example
See the R COBOL Data Integration project (RCOBOLDI) for a more involved example where we convert COBOL CopyBook data into an R data frame — in particular, see the ReadCopyBookAsDataFrame R function (line # 145) as well as the JCopyBookConverter.readCopyBookAsJDataFrameBuilder method (line # 69 – 139).
private JDataFrameBuilder readCopyBookAsJDataFrameBuilder(
AbstractLineReader reader,
LayoutDetail layout,
String font,
IUpdateFieldName updateFldName
) throws IOException {
JDataFrameBuilder result =
new JDataFrameBuilder(
new JDataFrame(),
new RemoteAdapter()
);
AbstractLine line;
RecordDetail rec = layout.getRecord(0);
for (int ctr = 1; ctr < rec.getFieldCount(); ctr++) {
var header = (updateFldName.updateName(rec.getField(ctr).getName()));
result.getDataFrame().addOrReturnExistingColumn(header);
}
int idx;
while ((line = reader.read()) != null) {
idx = line.getPreferredLayoutIdx();
if (0 <= idx) {
for (int ctr = 1; ctr < layout.getRecord(idx).getFieldCount(); ctr++) {
var header = rec.getField(ctr).getName();
var value = line.getFieldValue(idx, ctr);
var formattedValue = (String) null;
if (value != null && !(value.isSpaces() || value.isLowValues() || value.isHighValues()) && value.isFieldPresent())
formattedValue = value.asString();
result
.getDataFrame()
.addOrReturnExistingColumn(header)
.addValues(new String[]{formattedValue});
}
}
}
return result;
}
Below we can see the example output from the R COBOL Data Integration package for the R Project, which uses the JDataFrame API.
JDataFrame API: An Inappropriate Data Engineering Use Case Example
The JDataFrame API is not appropriate for moving very large data payloads from Java to the R Project for Statistical Computing. For data payloads that are substantially large it may behoove the Data Engineer or Data Scientist to load the data into Apache Spark, for example, and then load the data from Spark into the R Project for Statistical Computing.
Open-Source JDataFrame API Conclusion
I hope you find the open source JDataFrame API to be useful and helpful. If you do use it, it would be great to hear where it’s being utilized and how it’s benefited your project, so please leave a comment below.
You must log in to post a comment.