What is the effect of Apache Nifi on data transfer and monitoring between multiple platforms?
First of all, will try to find the answer to some primary questions “What is the Apache Nifi” and “Why use Nifi” in this article. Afterward, will give and explain technical detail about the tool and environment.
If you need a standard definition, you can check it here, but I will define it as a Nifi user/developer.
Nifi is basically a connection service for data between multiple protocols and platforms. It can transport the data from queue to queue, queue to service or database to queue, etc. Platform and protocol scope is extremely huge because its design allows different things to be done with a module (Processor) or configuration.
The significant point of here is that Nifi provides data transfer between structures and platforms operating independently from each other. It also helps us track data transfer stages in detail. This tool allows fast development to developers because of provides these facilty.
It has many configurations to run on the operating system. The most striking feature is that it can provide clustering structure. It allows simultaneous work on more than one hardware(OS) with a master node and multiple sub-nodes. It can provide task communication between all these nodes with the embedded Zookeeper.
When we consider it in terms of software design, it has that can be improved with modules. Its name is “Processor”. If you start to develop a Nifi flow in order to your data transportation, you will often encounter this term.
“Each service has a specific job”. This definition may seem familiar from somewhere. If you have microservice experience, you’ve definitely heard of this definition. Nifi architecture is also designed based on this.
“Each processor has a specific job”. Each processor does the job defined under his responsibility. After performing this task, it archives the result and the input processor receives and sends it to the next processor.
Processors share data with each other through a small queue between them. This queue can keep the flow file on itself to forward to the next processor, but it has a maximum data limit. If you want to read more, here.
Nifi comes with many basic processors. These are designed to provide basic data transfer flow and contain many different configurations. Some of these processors are as follows; ConsumeAMQP, PublishAMQP, QueryDatabaseTable, PutFile, InvokeHTTP, etc.
You may have jobs that cannot be done with the existing processor. Some workflow needs a more specific one for your task. So you can develop your processor with Java.
It has many rules for developing your processor. The most important thing to know when developing your own processor is the version. While running Nifi and compiling the custom processor that your developing, you should use the same JDK or your running Nifi must high version JDK.
Let’s try to develop our custom processor. I’ll show you move data on flow and manipulate it in this.
Firstly, add a new Archetype in IDE if this is your first processor development.
-GroupId: org.apache.nifi-ArtifactId: nifi-processor-bundle-archetype-Version: 1.5.0
Don't forget the additional property for maven.
-Name: artifactBaseName-Value: customProcessor
After complete all setup properties, you will have a new project for custom processor development.
I will share the code below the article.
After coding, you should build your project to use “maven clean install”.
Your processor build result keeps in the “Target” folder in your “nar” module root.
Copy this nar file from here to Nifi Root at the “lib” folder.
Also, you will see the other default processors in there.
Custom processor development and deployment processes ended successfully.
You can start the Apache Nifi and put your custom processor on your flow page.