JDataFrame API: Convert Your Java Data Model Into An R Data Frame In Minutes!

A few years ago I was working on developing a software solution that required data inย Javaย to be converted into a data frame (data.frame) in theย R Project for Statistical Computing. The data model for this project was sufficiently involved and I wanted to be able to convert this data into JSON, in Java, such that the resultant JSON could be converted into a data frame in theย R Project for Statistical Computingย using theย RJSONIO package. Theย JDataFrameย API solves this pain point by allowing the user, for example, Data Engineers or Data Scientists, to map the existing data model in Java to an instance of JDataFrame, which can then be exported asย JSON.

JDataFrame API for Data Engineers and Data Scientists: A Simple One Minute Example

Here we’ll review what’s required to use the JDataFrame API by looking at a simple three-step working example in Java; a fourth step is also included which requires R script and the RJSONIO package.

Step One: Create an instance of the JDataFrameBuilder

In the first step we create an instance of the JDataFrameBuilder class — this is simple enough and the key should always be of type String and the value should be of type Object, as shown below.

				
					JDataFrameBuilder<String, Object[]> dataFrameBuilder = new JDataFrameBuilder<String, Object[]>(
    new JDataFrame<String, Object[]>(),
    new RemoteAdapter<String, Object[]>());
				
			

JDataFrame source code is available on Bitbucket

The JDataFrame framework is available on BitBucket and can be built from source at the moment. If you would like a prebuilt Maven dependency, or are experiencing build issues, please send me an email.

Enterprise Data Adapter

The JDataFrame framework also uses the Enterprise Data Adapter framework, which is also available on Bitbucket.

Step Two: Add Columns

In the second step, we’ll add a column which consists of both the header as well as the data.

				
					dataFrameBuilder.addColumn("Some Header", new Object[] {"data point 1", "data point 2"});
				
			

Step Three: Invoke the serialize method

In the third step, we convert this data into a String containing JSON which can be deserialized using the RJSONIO package.

				
					String result = (String) dataFrameBuilder.serialize();
				
			

Below we can see an example of the JSON returned from the call to the serialize method.

				
					{"Some Header":["data point 1","data point 2"]}
				
			

Step Four: Convert the JSON to an R data.frame

In the fourth and final step in this example, we convert this JSON into an R data.frame using the RJSONIO package.

				
					temp <- RJSONIO::fromJSON(json)
tempDF <- as.data.frame(temp)
				
			

In the next section, we’ll explore a slightly more complicated example.

JDataFrame API for Data Engineers and Data Scientists: A More Involved Two Minute Example

Here we include an example of the JDataFrame in action. The gist is written in R Script and relies on the Groovy programming language, rJava, and RSONIO. This script populates an instance of JDataFrame with two columns (US state codes and descriptions), converts that data into JavaScript Object Notation (JSON), and then we use the RJSONIO package to convert the resultant JSONย into an R data frame.

Groovy is an excellent language to embed in R script since we can get all of the benefits of using Java and without the compilation step required if we were to write this logic in pure Java code.

In the next step, we use the RJSONIO fromJSON function to convert JSON content to R objects, and then we convert the temp result into a new data.frame using the as.data.frame function.

The JDataFrame framework relies on theย Google GSON API for serialization.

				
					groovyJars <- list (
    "C:/development/projects/rGroovy/groovy.jars/groovy-2.4.5-indy.jar",
    "C:/development/projects/rGroovy/groovy.jars/ivy-2.4.0.jar"
)

options(GROOVY_JARS=groovyJars)

library(rGroovy)

rGroovy::Initialize()

script <- paste (
    "@Grab(group='com.coherentlogic.r.integration', module='jdataframe-core', version='0.8.5-RELEASE')",
    "import com.coherentlogic.r.integration.data.frame.JDataFrameBuilder",
    "def codes = ['WV', 'VA'] as String[]",
    "def descriptions = ['West Virginia', 'Virginia'] as String[]",
    "return new JDataFrameBuilder()",
    ".addColumn('Code', codes)",
    ".addColumn('Description', descriptions)",
    ".toJson()", sep="n")

json <- rGroovy::Evaluate(groovyScript = script)

temp <- RJSONIO::fromJSON(json)
tempDF <- as.data.frame(temp)
tempDF
				
			

JDataFrame API for Data Engineers and Data Scientists: Five Minute Example

See the R COBOL Data Integration project (RCOBOLDI) for a more involved example where we convert COBOL CopyBook data into an R data frame — in particular, see the ReadCopyBookAsDataFrame R function (line # 145) as well as the JCopyBookConverter.readCopyBookAsJDataFrameBuilder method (line # 69 – 139).

				
					private JDataFrameBuilder<String, String[]> readCopyBookAsJDataFrameBuilder(
    AbstractLineReader reader,
    LayoutDetail layout,
    String font,
    IUpdateFieldName updateFldName
) throws IOException {

    JDataFrameBuilder<String, String[]> result =
        new JDataFrameBuilder<String, String[]>(
            new JDataFrame<String, String[]>(),
                new RemoteAdapter<String, String[]>()
        );

    AbstractLine line;

    RecordDetail rec = layout.getRecord(0);

    for (int ctr = 1; ctr < rec.getFieldCount(); ctr++) {

        var header = (updateFldName.updateName(rec.getField(ctr).getName()));

        result.getDataFrame().addOrReturnExistingColumn(header);
    }

    int idx;

    while ((line = reader.read()) != null) {

        idx = line.getPreferredLayoutIdx();

        if (0 <= idx) {

            for (int ctr = 1; ctr < layout.getRecord(idx).getFieldCount(); ctr++) {

                var header = rec.getField(ctr).getName();

                var value = line.getFieldValue(idx, ctr);

                var formattedValue = (String) null;

                if (value != null && !(value.isSpaces() || value.isLowValues() || value.isHighValues()) && value.isFieldPresent())
                    formattedValue = value.asString();

                result
                    .getDataFrame()
                    .addOrReturnExistingColumn(header)
                    .addValues(new String[]{formattedValue});
            }
        }
    }

    return result;
}
				
			

Below we can see the example output from the R COBOL Data Integration package for the R Project, which uses the JDataFrame API.

RStudio output demonstrating the JDataFrame in use via the R Cobol Data Integration package (RCOBOLDI) with pointers to the required steps: initializing the package and calling the ReadCopyBookAsDataFrame function; finally we can review the resultant data frame.
An example of the JDataFrame API being used in the RCOBOLDI package for the R Project.

JDataFrame API: An Inappropriate Data Engineering Use Case Example

The JDataFrame API is not appropriate for moving very large data payloads from Java to the R Project for Statistical Computing. For data payloads that are substantially large it may behoove the Data Engineer or Data Scientist to load the data into Apache Spark, for example, and then load the data from Spark into the R Project for Statistical Computing.

Open-Source JDataFrame API Conclusion

I hope you find the open source JDataFrame API to be useful and helpful. If you do use it, it would be great to hear where it’s being utilized and how it’s benefited your project, so please leave a comment below.

ThosPFuller

When it comes to Digital Marketing as a/an: Organic SEO Consultant: I can help improve your website traffic, increase search engine rankings, and increase brand visibility; Technical SEO Consultant: I can help improve your website performance, identify and fix errors, improve crawlability, and optimize your website structure and code; WordPress SEO Consultant: I can help improve your WordPress website ranking, improve your WordPress website usability, and optimize your WordPress website content and plugins. I am based in Northern Virginia -- which is in the Washington DC metropolitan area.