Usage of Flat files(CSV) in Helical Insight

Starting from Helical Insight Enterprise Edition 5.2.2 onwards, we are providing an in-built driver to connect to and use flat files like Excel, CSV, JSON, and Parquet. Hence, without the need of using any middlewares (like Drill etc) now you can directly connect and use CSV within Helical Insight.

We are providing detailed information on how to connect to and use CSV files :

1. Log in to your Helical Insight application and go to the “Data Sources” module.

2. Once you are on “Data Sources” module, you will see many options to connect to different databases. Out of that you need to choose “Flatfile csv”. You can make use of search option at the top right also to search for this specific driver Reference image is provided below:

Usage of Flat files(CSV) in Helical Insight

3. Once you click on ‘Flatfile csv‘ and choose ‘Create‘, a popup will open. Reference image is provided below

Usage of Flat files(CSV) in Helical Insight

4. Host: It can be provided in two ways:

a. Upload the file: If we upload a csv file using the upload option, that file will get uploaded at a specific location in the Helical Insight server and this host section will be filled automatically.

b. Manually provide the csv file path: In this case, the file must be present on the same server where the Helical Insight server is installed and then in the host you will put something like below.

Example for linux path: /usr/local/traveldata.csv

We recommend using the file Upload option

5. In the “Datasource Name” section, we can provide any name of our choice with which the connection name will be saved and listed.

6. In the ‘Configuration Editor‘ we need to provide the configuration details. These details generally change based on the type of flat file being used. In most cases your connection to CSV will work as it is without even the need to go here and make any changes. In most cases you can give Connection Name, Test Connection, Save Connection. And then you can “Create Metadata” and further use Helical Insight.

ADVANCED USE CASES:

There are a lot of other configuration options which are also present. These configuration options can be used for more advanced use cases. In most cases it would not be required though to be tweaked. Some of them we have described below.

7. All the sample configuration details are provided on the icon next to the ‘Configuration Editor‘ An image is provided below

Usage of Flat files(CSV) in Helical Insight

8. Once you click on this icon, it will open a pop up with all the configurations for different flat files.

Usage of Flat files(CSV) in Helical Insight

You will get direct “Copy to clipboard”, option which you can use to copy the content. You need to make necessary changes based on your details.

Explanation of configuration options :

1. tableName:

	Value: "mydata"
	Explanation: This specifies the name of the table that will be created or referenced when you create the metadata. In this above case, the table will be created with the name mydata.

2. strategy:

	Value: "in-memory"
	Explanation: Specifies the data processing strategy.
	"in-memory": Data will be processed and stored temporarily in memory, without persistence to a physical location ( Recommended approach).
	"persistent": Data will be persisted to a physical database file for storage. When the sizes of file are very huge, then this strategy is recommended.

3. persistentLocation:

	Value: "" (empty string)
	Explanation: Indicates the location for persistent storage when the strategy is set to "persistent". For inmemory this setting can be ignored.
	If the strategy is "persistent", a valid file path must be specified here (e.g., C:\\dbs\\test.duckdb).
	When left empty, no persistent storage is configured.

4. extensions:

Value: ["excel", "spatial"]
	Explanation: Specifies the supported file types or processing extensions. For CSV, we should have either excel or spatial.
	"excel" enables the configuration to handle Excel files.
	"spatial" may indicate support for spatial

5. config:

This section contains additional configuration details for processing the flat file.

a. layer:

	Value: ["sheet 1", "sheet 2"]
	Explanation: Specifies the sheet(s) in the Excel file to be processed. Excel file can have multiple sheets and you can specify which all sheets (their names) should be used like the above. If some sheet name is not provided that will get ignored.
	No need to put this for CSV.

b. open_options:

	Value: ["HEADERS=FORCE", "FIELD_TYPES=AUTO"]
	Explanation: These are options for interpreting and processing the data.
	"HEADERS=FORCE" ensures that the first row of the sheet is treated as headers, even if this is not explicitly set in the CSV file.
	"FIELD_TYPES=AUTO" enables automatic detection and assignment of field types (e.g., string, integer, date).

8. We have uploaded the “TravelData.csv” file using the ‘Upload’ option and provided the required configuration in the Configuration Editor

{
  "tableName": "TravelData",
  "strategy": "in-memory",
  "persistentLocation": "",
  "extensions": [
    "spatial"
  ],
  "config": {
    "auto_detect": true,
    "parallel": true
  }
}

9. Click on Test Connection, it gives The connection test is successful

(If there are no issues with configuration) click on Save Datasource

Usage of Flat files(CSV) in Helical Insight

10. Go to the metadata page and expand the Flatfile csv data source. Then expand the CSV File connection. Expand ‘memory‘ and then ‘main’ and it will show the table name that we provided in the data source connection configuration. Drag the table into metadata

Usage of Flat files(CSV) in Helical Insight

11. Create a report using the metadata and save it

Usage of Flat files(CSV) in Helical Insight

NOTE: In the configuration, we can even add more and more configuration options also. Below can be referred.

Name	Description	Type	Default
all_varchar	Option to skip type detection for CSV parsing and assume all columns to be of type VARCHAR.	BOOL	false
allow_quoted_nulls	Option to allow the conversion of quoted values to NULL values	BOOL	true
auto_detect	Enables auto detection of CSV parameters.	BOOL	true
auto_type_candidates	Allows specifying types for CSV column type detection. VARCHAR is always included as a fallback.	TYPE[]	default types
columns	Specifies the column names and types within the CSV file (e.g., {‘col1’: ‘INTEGER’, ‘col2’: ‘VARCHAR’}). Implies no auto detection.	STRUCT	(empty)
compression	The compression type for the file. Auto-detected by default (e.g., t.csv.gz -> gzip, t.csv -> none).	VARCHAR	auto
dateformat	Specifies the date format to use when parsing dates. See Date Format example below.	VARCHAR	(empty)
decimal_separator	The decimal separator of numbers.	VARCHAR	.
delim or sep	Specifies the character that separates columns within each row.	VARCHAR	,
escape	Specifies the string used to escape data character sequences matching the quote value.	VARCHAR	“
filename	Whether an extra filename column should be included in the result.	BOOL	false
force_not_null	Do not match specified columns’ values against the NULL string.	VARCHAR[]	[]
header	Specifies that the file contains a header line with the names of each column.	BOOL	false
hive_partitioning	Whether or not to interpret the path as a Hive partitioned path.	BOOL	false
ignore_errors	Option to ignore any parsing errors encountered and ignore rows with errors.	BOOL	false
max_line_size	The maximum line size in bytes.	BIGINT	2097152
names	The column names as a list.	VARCHAR[]	(empty)
new_line	Set the new line character(s) in the file. Options are ‘\r’,’\n’, or ‘\r\n’.	VARCHAR	(empty)
normalize_names	Whether column names should be normalized by removing non-alphanumeric characters.	BOOL	false
null_padding	Pads remaining columns on the right with null values if a row lacks columns.	BOOL	false
nullstr	Specifies the string or list of strings that represent a NULL value.	VARCHAR or VARCHAR[]	(empty)
parallel	Whether or not the parallel CSV reader is used.	BOOL	true
quote	Specifies the quoting string to be used when a data value is quoted.	VARCHAR	“
sample_size	The number of sample rows for auto detection of parameters.	BIGINT	20480
skip	The number of lines at the top of the file to skip.	BIGINT	0
timestampformat	Specifies the date format to use when parsing timestamps.	VARCHAR	(empty)
types or dtypes	The column types as either a list (by position) or a struct (by name).	VARCHAR[] or STRUCT	(empty)
union_by_name	Whether the columns of multiple schemas should be unified by name rather than by position.	BOOL	false

Usage of Flat files(CSV) in Helical Insight

Leave a Reply Cancel reply

Useful Links

Follow Us

Important Links

Demos

Partners

Usage of Flat files(CSV) in Helical Insight

You May Also Like

Alternative to Apache Superset

How Helical Insight Simplifies Embedded BI

Leave a Reply Cancel reply

Useful Links

Follow Us

Important Links

Demos

Partners