【Spark八十五】Spark Streaming分析结果落地到MySQL
Mysql  /  houtizong 发布于 3年前   149
几点总结:
1. DStream.foreachRDD是一个Output Operation,类似于RDD的action,会触发Job的提交。DStream.foreachRDD是数据落地很常用的方法
2. 获取MySQL Connection的操作应该放在foreachRDD的参数(是一个RDD[T]=>Unit的函数类型),这样,当foreachRDD方法在每个Worker上执行时,连接是在Worker上创建。如果Connection的获取放到dstream.foreachRDD之前,那么
Connection的获取动作将发生在Driver端,然后通过序列化的方式发送到各个Worker(Connection的序列化通常是无法正确序列化的)
3. Connection的获取在foreachRDD的参数中获取,同时还要在遍历RDD之前获取(调用RDD的foreach方法前获取),如果遍历中获取,那么RDD中的每个record都要打开关闭连接,这对于数据库连接资源将是极大的考验
4. 业务逻辑处理定义在func中,它是在foreachRDD的方法参数体中定义的,如果把func的定义放到外面,即Driver中,貌似也是可以的,Spark会对计算方法通过Broadcast进行广播到各个计算节点。
package spark.examples.streamingimport java.sql.{PreparedStatement, Connection, DriverManager}import java.util.concurrent.atomic.AtomicIntegerimport org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.streaming._import org.apache.spark.streaming.StreamingContext._//No need to call Class.forName("com.mysql.jdbc.Driver") to register Driver?object SparkStreamingForPartition { def main(args: Array[String]) { val conf = new SparkConf().setAppName("NetCatWordCount") conf.setMaster("local[3]") val ssc = new StreamingContext(conf, Seconds(5)) //This ds
tream object represents the stream of data that will be received from the data //server. Each record in this DStream is a line of text //The DStream is a collection of RDD, which makes the method foreachRDD reasonable val dstream = ssc.socketTextStream("192.168.26.140", 9999) dstream.foreachRDD(rdd => { //embedded function def func(records: Iterator[String]) { var conn: Connection = null var stmt: PreparedStatement = null try { val url = "jdbc:mysql://192.168.26.140:3306/person"; val user = "root"; val password = "" conn = DriverManager.getConnection(url, user, password) records.flatMap(_.split(" ")).foreach(word => { val sql = "insert into TBL_WORDS(word) values (?)"; stmt = conn.prepareStatement(sql); stmt.setString(1, word) stmt.executeUpdate(); }) } catch { case e: Exception => e.printStackTrace() } finally { if (stmt != null) { stmt.close() } if (conn != null) { conn.close() } } } val repartitionedRDD = rdd.repartition(3) repartitionedRDD.foreachPartition(func) }) ssc.start() ssc.awaitTermination() }}
请勿发布不友善或者负能量的内容。与人为善,比聪明更重要!
技术博客集 - 网站简介:
前后端技术:
后端基于Hyperf2.1框架开发,前端使用Bootstrap可视化布局系统生成
网站主要作用:
1.编程技术分享及讨论交流,内置聊天系统;
2.测试交流框架问题,比如:Hyperf、Laravel、TP、beego;
3.本站数据是基于大数据采集等爬虫技术为基础助力分享知识,如有侵权请发邮件到站长邮箱,站长会尽快处理;
4.站长邮箱:[email protected];
文章归档
文章标签
友情链接