JDataFrame API: Convert a Java data model into an R data frame in minutes!

A few years ago I was working on developing a software solution that required data in Java to be converted into a data frame (data.frame) in the R Project for Statistical Computing. The data model for this project was sufficiently involved and I wanted to be able to convert this data into JSON, in Java, such that the resultant JSON could be converted into a data frame in the R Project for Statistical Computing using the RJSONIO package. The JDataFrame API solves this pain point by allowing the user, for example, Data Engineers or Data Scientists, to map the existing data model in Java to an instance of JDataFrame, which can then be exported as JSON.

JDataFrame API for Data Engineers and Data Scientists: A Simple One Minute Example

In this pure Java example we create an instance of JDataFrameBuilder, we add a column with a header and some data, and then we call the serialize method which returns JSON.

				
					JDataFrameBuilder<String, Object[]> dataFrameBuilder = new JDataFrameBuilder<String, Object[]>(
    new JDataFrame<String, Object[]>(),
    new RemoteAdapter<String, Object[]>());

dataFrameBuilder.addColumn("Some Header", new Object[] {"data point 1", "data point 2"});

String result = (String) dataFrameBuilder.serialize();
				
			

Below we can see an example of the JSON returned from the call to the serialize method.

				
					{"Some Header":["data point 1","data point 2"]}
				
			

In the next section, we’ll explore a slightly more complicated example.

JDataFrame source code is available on Bitbucket

The JDataFrame framework is available on BitBucket and can be built from source at the moment. If you would like a prebuilt Maven dependency, or are experiencing build issues, please send me an email.

Enterprise Data Adapter

The JDataFrame framework also uses the Enterprise Data Adapter framework, which is also available on Bitbucket.

JDataFrame API for Data Engineers and Data Scientists: A More Involved Two Minute Example

Here we include an example of the JDataFrame in action. The gist is written in R Script and relies on the Groovy programming language, rJava, and RSONIO. This script populates an instance of JDataFrame with two columns (US state codes and descriptions), converts that data into JavaScript Object Notation (JSON), and then we use the RJSONIO package to convert the resultant JSON into an R data frame.

Groovy is an excellent language to embed in R script since we can get all of the benefits of using Java and without the compilation step required if we were to write this logic in pure Java code.

In the next step, we use the RJSONIO fromJSON function to convert JSON content to R objects, and then we convert the temp result into a new data.frame using the as.data.frame function.

The JDataFrame framework relies on the Google GSON API for serialization.

				
					groovyJars <- list (
    "C:/development/projects/rGroovy/groovy.jars/groovy-2.4.5-indy.jar",
    "C:/development/projects/rGroovy/groovy.jars/ivy-2.4.0.jar"
)

options(GROOVY_JARS=groovyJars)

library(rGroovy)

rGroovy::Initialize()

script <- paste (
    "@Grab(group='com.coherentlogic.r.integration', module='jdataframe-core', version='0.8.5-RELEASE')",
    "import com.coherentlogic.r.integration.data.frame.JDataFrameBuilder",
    "def codes = ['WV', 'VA'] as String[]",
    "def descriptions = ['West Virginia', 'Virginia'] as String[]",
    "return new JDataFrameBuilder()",
    ".addColumn('Code', codes)",
    ".addColumn('Description', descriptions)",
    ".toJson()", sep="n")

json <- rGroovy::Evaluate(groovyScript = script)

temp <- RJSONIO::fromJSON(json)
tempDF <- as.data.frame(temp)
tempDF
				
			

JDataFrame API for Data Engineers and Data Scientists: Five Minute Example

See the R COBOL Data Integration project (RCOBOLDI) for a more involved example where we convert COBOL CopyBook data into an R data frame — in particular, see the ReadCopyBookAsDataFrame R function (line # 145) as well as the JCopyBookConverter.readCopyBookAsJDataFrameBuilder method (line # 69 – 139).

				
					private JDataFrameBuilder<String, String[]> readCopyBookAsJDataFrameBuilder(
    AbstractLineReader reader,
    LayoutDetail layout,
    String font,
    IUpdateFieldName updateFldName
) throws IOException {

    JDataFrameBuilder<String, String[]> result =
        new JDataFrameBuilder<String, String[]>(
            new JDataFrame<String, String[]>(),
                new RemoteAdapter<String, String[]>()
        );

    AbstractLine line;

    RecordDetail rec = layout.getRecord(0);

    for (int ctr = 1; ctr < rec.getFieldCount(); ctr++) {

        var header = (updateFldName.updateName(rec.getField(ctr).getName()));

        result.getDataFrame().addOrReturnExistingColumn(header);
    }

    int idx;

    while ((line = reader.read()) != null) {

        idx = line.getPreferredLayoutIdx();

        if (0 <= idx) {

            for (int ctr = 1; ctr < layout.getRecord(idx).getFieldCount(); ctr++) {

                var header = rec.getField(ctr).getName();

                var value = line.getFieldValue(idx, ctr);

                var formattedValue = (String) null;

                if (value != null && !(value.isSpaces() || value.isLowValues() || value.isHighValues()) && value.isFieldPresent())
                    formattedValue = value.asString();

                result
                    .getDataFrame()
                    .addOrReturnExistingColumn(header)
                    .addValues(new String[]{formattedValue});
            }
        }
    }

    return result;
}
				
			

Below we can see the example output from the R COBOL Data Integration package for the R Project, which uses the JDataFrame API.

RStudio output demonstrating the JDataFrame in use via the R Cobol Data Integration package (RCOBOLDI) with pointers to the required steps: initializing the package and calling the ReadCopyBookAsDataFrame function; finally we can review the resultant data frame.
An example of the JDataFrame API being used in the RCOBOLDI package for the R Project.

JDataFrame API: An Inappropriate Data Engineering Use Case Example

The JDataFrame API is not appropriate for moving very large data payloads from Java to the R Project for Statistical Computing. For data payloads that are substantially large it may behoove the Data Engineer or Data Scientist to load the data into Apache Spark, for example, and then load the data from Spark into the R Project for Statistical Computing.

Open-Source JDataFrame API Conclusion

I hope you find the open source JDataFrame API to be useful and helpful. If you do use it, it would be great to hear where it’s being utilized and how it’s benefited your project, so please leave a comment below.

thospfuller

I am a Web Design, Technical SEO, and WordPress Specialist based in Northern Virginia. I am interested in software development, content engineering, and business. I'm originally from Chicago, IL, and currently reside in Reston, VA.