How to read COBOL Data files into the R Project for Statistical Computing

Imagine being able to read COBOL data files in the R Project for Statistical Computing — this would result in a huge opportunity for data mining the COBOL Copybook and data files that exist within organizations that still run software based on the COBOL language.

How to read COBOL Copybook files into the R Project TOC

And the R COBOL Data Integration Package (RCOBOLDI) delivers exactly this.

If your organization is running one or more COBOL programs and you’d like to perform some data analysis on the COBOL files the programs are producing this R package may be just what you need.

R COBOL Data Integration on GitHub

The R COBOL Data Integration package (RCOBOLDI) is open-source software released under the GNU LGPL license. The project source code and drat repository for the R COBOL Data Integration package can be found on GitHub.

ttt

R COBOL Data Integration on DockerHub

The R COBOL Data Integration package (RCOBOLDI) is also available as a Docker image on DockerHub which includes all of the required dependencies so that you can try this package with minimal effort.

Complete installation instructions for the rcoboldi-rocker-rstudio docker image can be found on GitHub including example R script as well as sample COBOL copybook and data files.

How the R COBOL Data Integration package works

Insofar as Java code goes for the R COBOL Data Integration project, most of the logic that might be of interest to the reader can be found in the JCopyBookConverter class.

The R COBOL Data Integration package relies on several important open-source components including:

JRecord API: COBOL data file converter

The JRecord API, in tandem with the JDataFrame API, is used to convert a given COBOL data file and COBOL Copybook file to JSON.

The JRecord API handles the COBOL Copybook to Java conversion, specifically.

JDataFrame API

The JDataFrame API is used by the R COBOL Data Integration package to convert the Java objects that were created by the JRecord API into JSON.

The JSON that is returned from the JDataFrame is designed to be convertible to an R data frame using the RJSONIO package.

RJSONIO: Serialize R Objects to JSON

The JSON that is returned from the JDataFrame is designed to be convertible to an R data frame using the RJSONIO package.

The snippet below is taken from lines 145 – 168 in the rcoboldi R script file and we can see that Java code is being called on line #2 and then RSONIO is used on line #9 to convert that result into an R data frame.

				
					

tryCatch(
    result <- jCopyBookConverter$readCopyBookAsString (copyBookFile, inFile, inputFileStructure, font, copybookDialect), Throwable = function (e) {
        stop(
            paste ("Unable to read the copyBook and convert it into JSON; copyBookFile: ", copyBookFile, ", inFile: ", inFile, ", inputFileStructure: ", inputFileStructure,", font: ", font, " -- details follow. Keep in mind that single-record type files can be converted to CSV however complicated multi-record type files will NOT map to CSV.", e$getMessage(), sep="")
        )
    }
)

resultAsJson <- RJSONIO::fromJSON(result)
				
			

rJava: Low-Level R to Java Interface

The rJava package is used as the bridge between Java and R and allows us to instantiate Java classes and invoke Java methods in R script.

				
					jCopyBookConverter <- .jnew('com/coherentlogic/rproject/integration/rcoboldi/api/JCopyBookConverter')
				
			

In the next section, we look at why one might be inclined to read COBOL data files in R.

Why read COBOL data files into the R Project for Statistical Computing?

Being able to read COBOL data files in the R Project for Statistical Computing can be beneficial for several reasons, especially when dealing with legacy systems or data stored in COBOL-specific formats. Here are several reasons why this capability can be valuable:

Access to Legacy Data

Many organizations have a significant amount of historical data stored in COBOL data files. Being able to read these files in R allows data analysts and researchers to access and analyze valuable legacy data without the need for manual data conversion.

Data Integration

Reading COBOL data files in R facilitates data integration efforts. Data engineers and scientists can combine COBOL data with other data sources and perform comprehensive analyses, thereby creating a more complete picture of your organization’s data.

Avoiding Data Loss

Converting COBOL data to other formats can introduce data loss or corruption. Reading the data directly in its native format minimizes the risk of losing critical information during the conversion process.

Automation

If your organization regularly receives COBOL data files, automating the process of reading and analyzing these files in R can save time and reduce manual data entry errors.

Data Validation

Reading COBOL data files in R allows for data validation and quality checks. Data engineers can apply data cleansing, validation rules, and anomaly detection to identify and address data quality issues.

Historical Analysis

Historical data often contains valuable insights. Reading COBOL data files in R enables you to perform historical trend analysis, forecasting, and other statistical analyses on legacy data sets.

Regulatory Compliance

In industries with stringent regulatory requirements (e.g., finance, healthcare), historical data stored in COBOL formats may need to be analyzed for compliance purposes. R’s analytical capabilities can help with compliance reporting and auditing.

Cost Savings

By avoiding the need to invest in costly data conversion tools or services, organizations can save money by reading COBOL data files directly in R.

Rapid Prototyping

Data scientists and analysts can quickly prototype and test analytical models using COBOL data in R, allowing for faster development of data-driven solutions.

Interoperability

R integrates well with other data analysis tools and platforms hence being able to read COBOL data files in R makes it easier to share and collaborate on data analysis projects with team members using different tools.

In summary, the ability to read COBOL data files in the R Project for Statistical Computing can unlock valuable data resources, streamline data integration efforts, improve data quality, and enable organizations to make more informed decisions by leveraging legacy data assets.

R COBOL Data Integration Video Tutorial

The video on the right provides you with a brief tutorial regarding how to use the R COBOL Data Integration package to convert a COBOL Copybook file and COBOL data file into an R data frame.

Precondition

Note that not all copybook files can be converted into data frames — for example single-record type files can be converted to R data frames however complicated multi-layered record structure files will NOT map to R data frames.

Step One: Install the R COBOL Data Integration (RCOBOLDI) package.

				
					library(drat)
drat::addRepo("thospfuller")
install.packages("RCOBOLDI")
				
			

The output when the install.packages function is executed should look similar to what we have below.

Install the RCOBOLDI COBOL Copybook parser plugin from the thospfuller drat repository on GitHub as follows: - library(drat) - drat::addRepo("thospfuller") - install.packages("RCOBOLDI")
Install the RCOBOLDI COBOL Copybook parser plugin from the thospfuller drat repository on GitHub.

Step Two: Initialize the R COBOL Data Integration (RCOBOLDI) package

				
					RCOBOLDI::Initialize()
				
			

The output when the Initialize function is called should look something like what we have below.

Initialize the RCOBOLDI COBOL Copybook Converter R Package
Initialize the RCOBOLDI COBOL Copybook Converter R Package

Step Three: Call the ReadCopyBookAsDataFrame function to load the Copybook and COBOL data files.

In this step we load the COBOL data files by calling the ReadCopyBookAsDataFrame function and passing it the COBOL Copybook file along with the COBOL data file.

NOTE: There are several sample COBOL copybook and data files available for testing.

				
					result <- RCOBOLDI::ReadCopyBookAsDataFrame(".../example1/DTAR020.cbl", ".../example1/DTAR020.bin", "Fixed Length Binary", "cp037")
				
			

The output when the ReadCopyBookAsDataFrame function is called should look something like what we have below.

Example output when the ReadCopyBookAsDataFrame function is called successfully in RStudio with pointers to the number of rows (headers, 5 in this example) and the total number of columns across all rows (1845 in this example).
Read the COBOL data file using the ReadCopyBookAsDataFrame function.

We can inspect the results once the COBOL data file has been loaded using the head command.

				
					head(result)
				
			

The result for this example should look something like what we have below.

After the COBOL Copybook transformation into an R data frame has completed we inspect the contents of the result using the head command which displays the first six rows.
After the COBOL Copybook transformation into an R data frame has finished we inspect the contents of the result using the head command which displays the first six rows.

Article Conclusion

As of 08.Nov.2023 the RCOBOLDI project on GitHub has 12 stars.

It is important to know that people are using the RCOBOLDI package so if you’re a user and getting value from this project, please consider giving this project a star.

ThosPFuller

When it comes to Digital Marketing as a/an: Organic SEO Consultant: I can help improve your website traffic, increase search engine rankings, and increase brand visibility; Technical SEO Consultant: I can help improve your website performance, identify and fix errors, improve crawlability, and optimize your website structure and code; WordPress SEO Consultant: I can help improve your WordPress website ranking, improve your WordPress website usability, and optimize your WordPress website content and plugins. I am based in Northern Virginia -- which is in the Washington DC metropolitan area.

Leave a Reply