Problem: An FTP (File Transfer Protocol) server is often used for data exchanges in many data integration scenario. Pentaho kettle has inbuilt feature to download and upload the files using the FTP. We have faced certain issues of this feature while downloading the large files.
Pentaho Data Integration[Pentaho kettle] FTP Task To Download Large Files.
Performance issue while downloading the large files using Pentaho Data Integration from ftp server. Sometimes the connections lost due to long running connections.
We have faced issues to download multiple ftp files using Pentaho.
Solutions for Problem 1:
we have used the windows ftp command to download the files but it supports only passive mode and windows7 is not supporting passive mode to download the files.
We have used MoveIt . MOVEit Freely is a free command line FTP/secure FTP SSL (FTPS) client . With the use of MoveIt, performance of downloading the large files has increased immensely.
Features of Moveit command line ftp :
• MOVEit Freely was one of the first clients to support all three FTP over SSLmodes
(TLS-P, TLS-C, and IMPLICIT).
• 50% faster file transfers (on average) resulting from its built-in, automatic GZIP data compression.
• Complete Non-Repudiation via MOVEit’s file integrity checking and MOVEit DMZ’s authentication capabilities, so you can prove who sent a file, who received it, and that it was not changed or corrupted between when the file was sent and received. ]
Solutions for Problem 2:
To download multiple files using FTP command line we have dynamically created ftp script to download the files.
Transformation 1: Get Zip File Names :
Transformation 2: Set Zip File Names :
Create a Wrapper Job :
In shell Script Pass File Name as Argument and write Below Script which will generate a new script file to connect FTP server a download Files.
Script to Write in Downloader.bat:
This script will take file name as argument and will generate a new script file with name of file to be downloaded at specified local directory.
%1 – File name to download pass as argument in shell script component.
After generating file (download.txt) it is passed to FTPS command(command line FTP tool) to download file. e.g.
This script will generate a new file (download.txt) for each input file name and file will be downloaded sequentially.