JDataFrame: Convert a Java data model into an R data frame in minutes!

A few years ago I was working developing a solution which required data in Java to be converted into a data frame (data.frame) in the R Project for Statistical Computing. The data model for this project was sufficiently involved and I wanted to be able to convert this data into JSON, in Java, such that the resultant JSON could be converted into a data frame in the R Project for Statistical Computing using the RJSONIO package.

The JDataFrame framework solves this pain point and facilitates the representation of data in Java which can be exported as JSON and then converted into a properly formatted data frame in the R Project for Statistical Computing using package rJava and the RJSONIO package.

JDataFrame source code is available on Bitbucket

The JDataFrame framework is available on BitBucket and can be built from source at the moment. If you would like a prebuilt Maven dependency, or are experiencing build issues, please send me an email.

Enterprise Data Adapter

The JDataFrame framework also uses the Enterprise Data Adapter framework, which is also available on Bitbucket.

JDataFrame: Two Minute Example

Here we include an example of the JDataFrame in action. The gist is written in R Script and relies on Groovy, rJava, and RSONIO. This script populates an instance of JDataFrame with two columns (codes and descriptions), converts that data into JSON, and then we use the RJSONIO package to convert that JavaScript Object Notation (JSON) into an R data frame.

In this simple example, which is a function in the R Project for Statistical Computing, we have package rJava (Java) and rGroovy (Groovy).

The Groovy script adds some simple US state codes and state descriptions as columns to an instance of the JDataFrameBuilder and then this data is converted into JSON.

Groovy is an excellent language to embed in R script since we can get all of the benefits of using Java and without the compilation step required if we were to write this logic as pure Java code.

In the next step, we use the RJSONIO fromJSON function to convert JSON content to R objects, and then we convert the temp result into a new data.frame using the as.data.frame function.

The JDataFrame framework relies on the Google GSON API for serialization.

				
					groovyJars <- list (
    "C:/development/projects/rGroovy/groovy.jars/groovy-2.4.5-indy.jar",
    "C:/development/projects/rGroovy/groovy.jars/ivy-2.4.0.jar"
)

options(GROOVY_JARS=groovyJars)

library(rGroovy)

rGroovy::Initialize()

script <- paste (
    "@Grab(group='com.coherentlogic.r.integration', module='jdataframe-core', version='0.8.5-RELEASE')",
    "import com.coherentlogic.r.integration.data.frame.JDataFrameBuilder",
    "def codes = ['WV', 'VA'] as String[]",
    "def descriptions = ['West Virginia', 'Virginia'] as String[]",
    "return new JDataFrameBuilder()",
    ".addColumn('Code', codes)",
    ".addColumn('Description', descriptions)",
    ".toJson()", sep="n")

json <- rGroovy::Evaluate(groovyScript = script)

temp <- RJSONIO::fromJSON(json)
tempDF <- as.data.frame(temp)
tempDF
				
			

JDataFrame: Five Minute Example

See the RCOBOLDI (R COBOL Data Integration) project for a more involved example where we convert COBOL CopyBook data into an R data frame — in particular, see the ReadCopyBookAsDataFrame R function (line # 145) as well as the JCopyBookConverter.readCopyBookAsJDataFrameBuilder method (line # 69 – 139).

				
					private JDataFrameBuilder<String, String[]> readCopyBookAsJDataFrameBuilder(
    AbstractLineReader reader,
    LayoutDetail layout,
    String font,
    IUpdateFieldName updateFldName
) throws IOException {

    JDataFrameBuilder<String, String[]> result =
        new JDataFrameBuilder<String, String[]>(
            new JDataFrame<String, String[]>(),
                new RemoteAdapter<String, String[]>()
        );

    AbstractLine line;

    RecordDetail rec = layout.getRecord(0);

    for (int ctr = 1; ctr < rec.getFieldCount(); ctr++) {

        var header = (updateFldName.updateName(rec.getField(ctr).getName()));

        result.getDataFrame().addOrReturnExistingColumn(header);
    }

    int idx;

    while ((line = reader.read()) != null) {

        idx = line.getPreferredLayoutIdx();

        if (0 <= idx) {

            for (int ctr = 1; ctr < layout.getRecord(idx).getFieldCount(); ctr++) {

                var header = rec.getField(ctr).getName();

                var value = line.getFieldValue(idx, ctr);

                var formattedValue = (String) null;

                if (value != null && !(value.isSpaces() || value.isLowValues() || value.isHighValues()) && value.isFieldPresent())
                    formattedValue = value.asString();

                result
                    .getDataFrame()
                    .addOrReturnExistingColumn(header)
                    .addValues(new String[]{formattedValue});
            }
        }
    }

    return result;
}
				
			

thospfuller

I write about mostly technology-related subjects including, for example, Kubernetes, AWS, Software Engineering, and Technical Search Engine Optimization (Technical SEO). I'm originally from Chicago, IL, and currently reside in Reston, VA.