In this blog we would be discussing and explaining how can connect to Excel as a datasource using Open Source BI product Helical Insight via Apache Drill.
You can refer to our other blog to learn how apache drill can be used to connect to other kind of flat files like CSV TSV Parquet JSON Avro and sequence files.
1) Download, install and start Apache Drill. Learn how to install Drill.
2) Configure the middleware to support flat files.
3) Make sure Apache drill (Middleware) is running and Login To the Helical Insight.
4) In the Home – > Go to the management , click on the middleware.(Refer to below image).
5) If the toggle that appears in the middleware is disabled, please enable it.
6) When you enable it, in the URL section you will find a more option, click on it and provide the server details like IP address and port. (it will create an URL like Ex: http://localhost:8085).
Note: If you are on HTTPS or SSl , please make sure that the toggle is ‘ON’(otherwise keep it OFF). (Refer to below image).
7) In the Storage implementation section you will have multiple options.
– Use standalone when middleware and helical insight are in the same machine. The dataware house path will be created inside the System Directory of the hi-repository folder. All the files uploaded will be saved in that location.
– Use the hdfs storage to upload your flat files into hadoop ecosystem. Hadoop should be up and running. Hdfs Host is ip address of the name node server. Hdfs port is the datanode port. The Data Warehouse path will be created in hadoop datanode. The path should have read and write access.
– Use SFTP when the drill/middleware is installed in separate server and Helical Insight is installed in different Server. The files will be uploaded to the server where drill is running. Incase drill/middleware is installed in the Windows machine, please use linux style path in Datawarehouse path. Example /C:/Users/Helical/your/path/to/datawarehouse
Note: If you select hdfs or sftp , then you have to provide the respective details like ip address, port etc. (Refer to below image)
-> SAVE the above configurations.
8) Now go to the Data Sources – > there you will see the “Excel” icon as a dataSource ( If middleware is not enabled you cannot see it)
9) Click on the Excel – > Go to Create – > Upload file ( Make sure the extension of the file is .xlsx) – > Give DataSouce name – > Test connection.
10) Now you can see the uploaded excel file in hi-repository.
11) If you want the Excel file to be uploaded in a particular folder, then at the management page, in Data Warehouse (Refer to above image) path give the folder name ( EX: /Folder).. this folder gets created in hi-repository/System folder.
12) Go to the Metadata page – > in the catalog section go to the Excel – > select the uploaded excel – > Add to metadata -> apply the conditions like joins, views, security if necessary – > save Metadata.
13) Now you can start using this metadata to create reports etc.