In this blog we are going to cover how to improve the performance of your Helical Insight instance. When all of these steps are followed it will improve the performance a lot.
a. Cache preparation and speeding up the cache preparation:As soon as a user creates the database connection on our datasource page our application starts preparaing the cache. In this cache we are preparing the list of all the catalogs, schemas and table names which are present in that specific connection.
So in case if you make a DBconnection which has got lot of tables like 5000 tables 10000 tables etc and then you immediately visit the metadata page, since the Helical Insight application have not had sufficient time to prepare cache, it might not be visible and you might get errors like “could not obtain cache”. So you should actually wait and when the cache is ready then only create metadata from the metadata page.
The most ideal way is you can create a new user at your DBlevel and this user should only have access to few tables which would be required for creating the reports and dashboards. This way since the cache to be prepared is for less number of tables, the cache preparation or cache refresh etc will always be very fast. Also even if you restart the server (in which case the cache gets removed) or refresh the cache etc even then in this case at the metadata page the performance would be very fast.
Please refer to the blog Overview of Home Page in Helical Insight 3.1, it has a section called cached datasources wherein you can understand more about cache and different statuses of cache.
b. Speeding up of fetching of Tables list: As mentioned in the above point as soon as a user creates a datasource the Helical Insight application starts fetching the list of tables for that specific datasource. There are processes which run to fetch the list and by default in every process/thread we fetch 25 table names. This count can also be increased to fetch more table names in a single
There is a file called “project.properties” and the file location is “..\hi\apache-tomcat-7\webapps\hi-ee\WEB-INF\classes”. You can open this file in edit mode and search for “metadataThreadsTableThreshold=”. It is generally present on line no 32. You can see that here it woudl be mentioned as 25, you can increase this upto 200. More advisable is keep it to 150.
Once done please restart the server. Now the process will fetch 150 table names in a single process.
c. Migration from Derby to Other Database: By default our application comes with Derby database which is a light weight file based database. This database stores details of cache, cached data, user role profiles etc.
Derby is not an enterprise level database. By migrating from Derby to some other DB like MySQL, Postgress, SQLServer etc it is possible to improve the performance as well.
You can refer to the below link to know more about doing the same thing.
Migration of user role from derby to another database
d. Optimal hardware: Based on the number of users who are accessing the application concurrently, the amount of data that is being transferred as well as the number of users who are using export the hardware sizing can be optimally decided. The hardware sizing will also depend on whether any middleware like Apache Drill, Dremio, etc are being used. People who are exporting the data also consume a huge amount of memory since with export of raw data al the dataset needs to be exported.
First thing which should be done is to increase the hardware sizing in conjunction with the amount of data and the number of users. Please make sure that the server allocated is dedicated hardware. A Linux server is always more preferable than a windows server since it is not having a GUI and thus automatically the performance is faster.
You can always reach out to support@helicalinsight.com to help you understand what could be the correct hardware sizing for your use case.
e. JVM: Based on the hardware allocated to Helical Insight it is also important to allocate the appropriate JVM to the application.
Try to give as much as possible JVM to the application.
How to Increase JVM Memory in Helical Insight Application
f. Tomcat Level Settings : Helical Insight by default gets installed with Tomcat webserver. There are various configurations which can be done at Tomcat level which can help with performance enhancement. Some of them we have covered below
1) Compress HTTP responses at Tomcat level
If you find that huge amount of data is being returned in the HTTP respnse, you can enable compression at the Tomcat level. This technique is especially useful for text-heavy dashboards and reports. For enabling compression, use the following steps:
- Shut down Tomcat server
- Edit the server.xml file present at location : …\hi\apache-tomcat-7\conf and add the attributes compression, compressMinSize, noCompressionUserAgents and compressibleMimeType to the connector. You can use the example below:
<Connector compressibleMimeType ="text/html,text/xml,text/css,text/javascript,application/javascript,application/json"
compression="on"
compressionMinSize="128"
connectionTimeout="20000"
noCompressionUserAgents="gozilla, traviata"
port="8080"
redirectPort="8443"protocol="HTTP/1.1"/>
- Restart Tomcat
2) Increase Number of concurrent threads at Tomcat Level
In order to increase the number of concurrent threads in Tomcat, we need to use the maxThreads attribute. For this, edit the server.xml file present at location …\hi\apache-tomcat-7\conf.
The example below shows how this can be used. The default value of maxThreads is 150. This can be increased further.
<Connector port="8011" protocol="HTTP/1.1"
connectionTimeout="20000"
maxThreads="150"
redirectPort="8447" />
g. Caching: We are having a caching functionality with cache timeout defined as 24 hours. If you think that yoru reporting db/DW is having new data refresh even latter than 24 hours then you can increase the timeout further.
Even at report level also you are having cache timeout refresh option. Usage of Cache Refresh Script
Note: Disabling cache memory, or having cache refresh for each and every report at a very short span of time will lead to performance degrading.
h. Raw Data Export: Charts generally have aggregated dataset, so exporting them does not lead to a lot of load on the BI server or DB server. Whereas there might be tabular reports as well which can have really huge amount of data with no aggregation applied. Though when being viewed at the frontend there is pagination which only shows first 10 records whereas the other set of records is being fetched and loaded in the subsequent pages.
Whereas if a person clicks on the export of this report, the entire data is to be downloaded from the DB server and then downloaded to the local laptop. Hence in those cases, this can result in performance issues. Hence the data export should be limited to only a few users / should be scheduled for a time when the load is lowest via HWF (Helical Workflow).
i. METADATA: It is a good practice to have only those tables and columns which are required for reports. A big metadata with lot of tables and columns will bring down the performance. You can also think of making multiple metadatas for different set of reporting requirements like a HR metadata for HR reports, sales metadata for sales reports etc. This way metadata size can be kept less.
On the datasources page, you can limit how much information gets displayed at the metadata layer (a sort of prefiltering). This can be done by going to “Advanced” option in the datasource connection.
In the “Other Options” placeholder, you can add restricting strings such as below:
?HI_SCHEMA_CONTAINS=hi (this will show only the schema names that contain the string hi)
?HI_CATALOG_CONTAINS=t (this will show only the catalog names that contain t)
?HI_CATALOG=travel_data (this will show only travel_data catalog)
Similar strings and regular expression (regex) can be used to restrict the database tables, schema, cataglos that are visible at the metadata level.
So with above method you can initially restrict the size of listing on the metadata page so that it is fast and make initial metadata. Then you can edit the already created datasource to include other table/schema (using the above mentioned method), then edit the metadata created earlier (refresh the cache) and then select additional tables required and make it part of the metadata by using the merge option. This way your create-metadata browser page will never be so heavy and will work fine.
j. Report designing: At report level any unnecessary sorting, filtering, joinsand other conditions should be avoided. The simpler the query, the faster it will execute. When having limited number of rows, you can use Settings option to add limit clause so that only limited data get fetched. Viewing in table is always better since there is pagination as compared to crosstab which tries to show entire data at a single instance.
k. Dashboard designing: Design an aesthetic dashboard with optimum number of panels. This will improve readability as well as performance. It is also important to keep a track of the number of dashboard variables being used. Check the number of dashboard variables by going to the “Variables” section in the toolbar.
If there are 4 reprots in a dashboard and they are listening to same dashboard variable, then it is a good practice to give the exact same names to the filters at reports level as well (IMAGE). This minimizes number of dashboard variables at dashboard level as well as helps avoid confusion while mapping the input parameters and listeners in individual reports on the dashboard. If you give different filternames at report level and even thought they will all listen to same dashboard variable at dashboard, but still since filternames are different at reportlevel we would be able to see a lot of dashboard variables which will affect the performance.
l. Load Balancing: Helical Insight application also supports Load balancing. You can have multiple servers between which you can distribute your load. You can refer to the below link to understand how to implement load balancing across multiple servers.
Implementing High Availability And Load Balancing In Helical Insight
m. Database Tuning: Generally whenever a person is viewing the report we are depending on the performance of the DB to give us the aggregated response to show the data at the frontend. Hence having a data warehouse / an optimized reporting database with Indexes etc will automatically help in making sure that the queries are very fast, thus the reports will also be rendered very fast.
- It would be ideal if there is a separate reporting db / datawarehouse
- It would be ideal if there are prebuilt calculated tables from which reporting happens with minimal to no joins
- The database should be able to support many concurrent connections
n. In cases when groovy connection is being used, then groovy managed connection to be used rather than groovy plain connection.
o. Inmemory Mode: Helical Insight can also be used in an In-memory mode wherein it is possible to load the dataset required for reporting from your transactional DB into a local columnar fast DB (like Presto, Spark, etc) and in turn Helical Insight will connect to this one. Since everything is present locally in RAM the performance is extremely fast here.
You can reach out to us at support@helicalinsight.com to know more about any of the steps mentioned above.