Running Sqoop to get SQL data on Azure HDInsight Instance

Azure HDInsight is a fully-managed cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. Use popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R & more.

Get started with Azure Hdinsight by following below link
https://azure.microsoft.com/en-us/services/hdinsight/

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Source: http://sqoop.apache.org/

Let's say if you want to run Sqoop on HDInsight node to Receive/Transfer Data from SQL.

Here is one such scenario where i was trying to list all the tables in my Azure SQL Database..

> sqoop list-tables --connect "jdbc:sqlserver://<YOUR_SQL_SERVER_NAME>.database.windows.net:1433;database=<YOUR_DATABASE_NAME>;user=<USER>;password=<PASSWORD>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;" -- --schema "<SCHEMA_IF_EXISTS>"

It might end up with below error

18/04/03 22:15:29 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.microsoft.jdbc.sqlserver.SQLServerDr
java.lang.RuntimeException: Could not load db driver class: com.microsoft.jdbc.sqlserver.SQLServerDriver
        at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
        at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)

Sqoop by default supports Several Databases, including MySQL but to use it for MSSQL you need to install SQL Driver.

Fortunately, we already have this installed for hive @ /usr/hdp/2.6.3.2-13/hive/lib/sqljdbc41.jar for each node in HDInsight.

All you have to do is run below line of code before using sqoop

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/hdp/2.6.3.2-13/hive/lib/sqljdbc41.jar

Now i run the same query as above

$ sqoop list-tables --connect "jdbc:sqlserver://<YOUR_SQL_SERVER_NAME>.database.windows.net:1433;database=<YOUR_DATABASE_NAME>;user=<USER>;password=<PASSWORD>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;" -- --schema "<SCHEMA_IF_EXISTS>"

As you can see below, it has listed all the tables in my Database.

Warning: /usr/hdp/2.6.3.2-13/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/04/03 22:29:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.2-13
18/04/03 22:29:36 INFO manager.SqlManager: Using default fetchSize of 1000
18/04/03 22:29:36 INFO manager.SQLServerManager: We will use schema SalesLT
Customer
ProductModel
vProductModelCatalogDescription
ProductDescription
Product
ProductModelProductDescription
vProductAndDescription
ProductCategory
vGetAllCategories
Address
CustomerAddress
SalesOrderDetail
SalesOrderHeader
Prashanth Madi

Prashanth Madi

I'm a programmer & Tech enthusiast. I work for Big Data Support Team at Microsoft, but this blog, its content & opinions are my own. I blog about tech, gadgets, code, where we're going & we've been.

Read More