How to read COBOL Data files into the R Project for Statistical Computing
Imagine being able to read COBOL data files in the R Project for Statistical Computing — this would result in a huge opportunity for data mining the COBOL Copybook and data files that exist within organizations that still run software based on the COBOL language.
How to read COBOL Copybook files into the R Project TOC
And the R COBOL Data Integration Package (RCOBOLDI) delivers exactly this.
If your organization is running one or more COBOL programs and you’d like to perform some data analysis on the COBOL files the programs are producing this R package may be just what you need.
R COBOL Data Integration on GitHub
The R COBOL Data Integration package (RCOBOLDI) is open-source software released under the GNU LGPL license. The project source code and drat repository for the R COBOL Data Integration package can be found on GitHub.
ttt
Complete installation instructions for the rcoboldi-rocker-rstudio docker image can be found on GitHub including example R script as well as sample COBOL copybook and data files.
How the R COBOL Data Integration package works
Insofar as Java code goes for the R COBOL Data Integration project, most of the logic that might be of interest to the reader can be found in the JCopyBookConverter class.
The R COBOL Data Integration package relies on several important open-source components including:
JRecord API: COBOL data file converter
The JRecord API, in tandem with the JDataFrame API, is used to convert a given COBOL data file and COBOL Copybook file to JSON.
The JRecord API handles the COBOL Copybook to Java conversion, specifically.
JDataFrame API
The JDataFrame API is used by the R COBOL Data Integration package to convert the Java objects that were created by the JRecord API into JSON.
The JSON that is returned from the JDataFrame is designed to be convertible to an R data frame using the RJSONIO package.
RJSONIO: Serialize R Objects to JSON
The JSON that is returned from the JDataFrame is designed to be convertible to an R data frame using the RJSONIO package.
The snippet below is taken from lines 145 – 168 in the rcoboldi R script file and we can see that Java code is being called on line #2 and then RSONIO is used on line #9 to convert that result into an R data frame.
tryCatch(
result <- jCopyBookConverter$readCopyBookAsString (copyBookFile, inFile, inputFileStructure, font, copybookDialect), Throwable = function (e) {
stop(
paste ("Unable to read the copyBook and convert it into JSON; copyBookFile: ", copyBookFile, ", inFile: ", inFile, ", inputFileStructure: ", inputFileStructure,", font: ", font, " -- details follow. Keep in mind that single-record type files can be converted to CSV however complicated multi-record type files will NOT map to CSV.", e$getMessage(), sep="")
)
}
)
resultAsJson <- RJSONIO::fromJSON(result)
rJava: Low-Level R to Java Interface
The rJava package is used as the bridge between Java and R and allows us to instantiate Java classes and invoke Java methods in R script.
jCopyBookConverter <- .jnew('com/coherentlogic/rproject/integration/rcoboldi/api/JCopyBookConverter')
In the next section, we look at why one might be inclined to read COBOL data files in R.
Why read COBOL data files into the R Project for Statistical Computing?
Being able to read COBOL data files in the R Project for Statistical Computing can be beneficial for several reasons, especially when dealing with legacy systems or data stored in COBOL-specific formats. Here are several reasons why this capability can be valuable:
Access to Legacy Data
Many organizations have a significant amount of historical data stored in COBOL data files. Being able to read these files in R allows data analysts and researchers to access and analyze valuable legacy data without the need for manual data conversion.
Data Integration
Reading COBOL data files in R facilitates data integration efforts. Data engineers and scientists can combine COBOL data with other data sources and perform comprehensive analyses, thereby creating a more complete picture of your organization’s data.
Avoiding Data Loss
Converting COBOL data to other formats can introduce data loss or corruption. Reading the data directly in its native format minimizes the risk of losing critical information during the conversion process.
Automation
If your organization regularly receives COBOL data files, automating the process of reading and analyzing these files in R can save time and reduce manual data entry errors.
Data Validation
Reading COBOL data files in R allows for data validation and quality checks. Data engineers can apply data cleansing, validation rules, and anomaly detection to identify and address data quality issues.
Historical Analysis
Historical data often contains valuable insights. Reading COBOL data files in R enables you to perform historical trend analysis, forecasting, and other statistical analyses on legacy data sets.
Regulatory Compliance
In industries with stringent regulatory requirements (e.g., finance, healthcare), historical data stored in COBOL formats may need to be analyzed for compliance purposes. R’s analytical capabilities can help with compliance reporting and auditing.
Cost Savings
By avoiding the need to invest in costly data conversion tools or services, organizations can save money by reading COBOL data files directly in R.
Rapid Prototyping
Data scientists and analysts can quickly prototype and test analytical models using COBOL data in R, allowing for faster development of data-driven solutions.
Interoperability
R integrates well with other data analysis tools and platforms hence being able to read COBOL data files in R makes it easier to share and collaborate on data analysis projects with team members using different tools.
In summary, the ability to read COBOL data files in the R Project for Statistical Computing can unlock valuable data resources, streamline data integration efforts, improve data quality, and enable organizations to make more informed decisions by leveraging legacy data assets.
R COBOL Data Integration Video Tutorial
The video on the right provides you with a brief tutorial regarding how to use the R COBOL Data Integration package to convert a COBOL Copybook file and COBOL data file into an R data frame.
Precondition
Note that not all copybook files can be converted into data frames — for example single-record type files can be converted to R data frames however complicated multi-layered record structure files will NOT map to R data frames.
Step One: Install the R COBOL Data Integration (RCOBOLDI) package.
library(drat)
drat::addRepo("thospfuller")
install.packages("RCOBOLDI")
The output when the install.packages function is executed should look similar to what we have below.

Step Two: Initialize the R COBOL Data Integration (RCOBOLDI) package
RCOBOLDI::Initialize()
The output when the Initialize function is called should look something like what we have below.

Step Three: Call the ReadCopyBookAsDataFrame function to load the Copybook and COBOL data files.
In this step we load the COBOL data files by calling the ReadCopyBookAsDataFrame function and passing it the COBOL Copybook file along with the COBOL data file.
NOTE: There are several sample COBOL copybook and data files available for testing.
result <- RCOBOLDI::ReadCopyBookAsDataFrame(".../example1/DTAR020.cbl", ".../example1/DTAR020.bin", "Fixed Length Binary", "cp037")
The output when the ReadCopyBookAsDataFrame function is called should look something like what we have below.

We can inspect the results once the COBOL data file has been loaded using the head command.
head(result)
The result for this example should look something like what we have below.

Article Conclusion
As of 08.Nov.2023 the RCOBOLDI project on GitHub has 12 stars.
It is important to know that people are using the RCOBOLDI package so if you’re a user and getting value from this project, please consider giving this project a star.