Apache Hive
The Apache Hive connector allows Trino to connect to a Hive metastore and query data stored in Apache Hadoop or S3 compatible objects storage.
Example Hive catalog configuration
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCatalog
metadata:
# The name of the catalog as it will appear in Trino
name: hive-catalog
# TrinoCluster can use these labels to select which catalogs to include
labels:
trino: simple-trino
spec:
connector:
# Specify hive here when defining a hive catalog
hive:
metastore:
configMap: simple-hive
s3:
inline:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: minio-credentials
# We can use configOverrides to add arbitrary properties to the Trino catalog configuration
configOverrides:
hive.metastore.username: trino
Connect to S3 store
The hive connector can connect to an S3 store as follows:
spec:
connector:
hive:
s3:
inline:
host: test-minio
port: 9000
accessStyle: Path
credentials:
secretClass: minio-credentials
# OR
s3:
reference: my-minio
See S3 resources for details about S3 connections.
Please make sure that the underlying Hive metastore also has access to the S3 store, because it will e.g. check if the directory exists when creating tables. |
Connect to HDFS cluster
The hive connector can connect to an HDFS operated by Stackable as follows:
spec:
connector:
hive:
hdfs:
configMap: simple-hdfs
Please make sure that the underlying Hive metastore also has access to the HDFS, because it will e.g. check if the directory exists when creating tables. |
Adding unmanaged Hive clusters
You can add connect Trino to Hive catalogs from systems that are not managed by Stackable, including Hive running on existing Hadoop clusters. Unmanaged Hive instances can be defined by creating a configMap containing the configuration for the remote Hive Metastore and HDFS or S3 storage services.
Create a Hive Metastore configMap
The Hive metastore ConfigMap contains the URL for the metastore’s thrift endpoint.
apiVersion: v1
kind: ConfigMap
metadata:
name: cloudera-hive
data:
HIVE: thrift://10.132.0.59:9083
Create a HDFS configMap
When the Hive data is stored on HDFS you will need to provide a configMap containing the HDFS configuration.
To do this take the core-site.xml
and hdfs-site.xml
from your Hadoop cluster and create a configMap with the keys core-site.xml
and hdfs-site.xml
.
apiVersion: v1
kind: ConfigMap
metadata:
name: cloudera-hdfs
data:
core-site.xml: |-
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://my.hadoop.cluster:8020</value>
</property>
<!-- truncated for brevity -->
</configuration>
hdfs-site.xml: |-
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dfs.namenode.servicerpc-address</name>
<value>my.hadoop.cluster:8022</value>
</property>
<!-- truncated for brevity -->
</configuration>
Create the Trino Hive catalog
To use the unmanaged Hive metastore we define a TrinoCatalog
object in the same way we would for a managed cluster, referencing the custom configMap we created for Hive and HDFS.
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCatalog
metadata:
name: clouderahive
labels:
trino: simple-trino
spec:
connector:
hive:
metastore:
configMap: cloudera-hive
hdfs:
configMap: cloudera-hdfs