【Hive一】Hive入门
编程技术  /  houtizong 发布于 3年前   267
Hive的运行需要依赖于Hadoop,因此需要首先安装Hadoop2.5.2,并且Hive的启动前需要首先启动Hadoop。
1. 从如下地址下载Hive0.14.0
http://mirror.bit.edu.cn/apache/hive/
2.解压hive,在系统变量里配置如下变量HIVE_HOME和PATH
sudo vim /etc/profileexport HIVE_HOME=/home/hadoop/apache-hive-0.14.0-binexport PATH=$HIVE_HOME/bin:$PATH
3. 修改Hive的配置文件
cp hive-env.sh.template hive-env.sh
4. 修改hive-env.sh中修改如下变量
HADOOP_HOME=/home/hadoop/hadoop-2.5.2export HIVE_CONF_DIR=/home/hadoop/apache-hive-0.14.0-bin/conf
5.
cp hive.default.template hive-site.xml
编辑hive-site.xml,将其中的变量${system:java.io.tmpdir}${system:user.name}改为如下目录,这个目录需要手工创建,需要特别注意的是system:java.io.tmpdir有多处,需要留意将所有都改掉
/home/hadoop/apache-hive-0.14.0-bin/iotmp
如果不进行这一步,在使用hive命令启动Hive时,会包类似如下错误:
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7Dat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
6. 在Hadoop中执行如下命令用于创建/user/hive/warehouse目录
hdfs dfs -mkdir -p /user/hadoop/warehouse
在HDFS上创建这个目录的原因是在hive-site.xml中,有如下的属性
<property> <name>hive.metastore.warehouse.dir</name> <!--This is the dir for hadoop--> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property>
1. 使用hive命令启动Hive
>hive
2. 执行show tables;用于显示当前的表
hive> show tables ;OKTime taken: 0.863 seconds
3. 创建简单表:
hive> create table abc(a int,b string);OKTime taken: 1.144 second
4. 表操作:
4.1 新建表
hive> create table Word (id int,word string) row format delimited fields terminated by ',' stored as textfile ; OKTime taken: 0.153 seconds
新建的表Word,有两个字段id和word,分别是整数和字符串类型,行的格式是使用逗号分隔的文本文件
4.2 在本地系统/user/home/创建一个新文件word.txt, 输入文件
1,MSN10,QQ100,Gtalk1000,Skype
4.3 将word.txt通过Hive上传到HDFS
hive> load data local inpath '/home/hadoop/word.txt' into table Word;Loading data to table default.wordTable default.word stats: [numFiles=1, totalSize=20]OKTime taken: 2.154 seconds
local指明从本地文件系统(/home/hadoop/word.txt)上传数据到Word表中
4.4 查询:select * from Word;
hive> select * from Word;OK1MSN10 QQ100Gtalk1000 SkypeTime taken: 0.671 seconds, Fetched: 3 row(s)
查询: select * from Word where id = 10;
hive> select * from Word where id = 10; OK1QQTime taken: 0.095 seconds, Fetched: 1 row(s)hive> select * from Word where idNotExist = 1;FAILED: SemanticException [Error 10004]: Line 1:25 Invalid table alias or column reference 'id1': (possible column names are: id, word)
1.
hadoop@tom-Inspiron-3521:~/hadoop-2.5.2/bin$ hdfs dfs -ls /user/hive/warehouseFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2014-12-24 23:17 /user/hive/warehouse/abcdrwxr-xr-x - hadoop supergroup 0 2014-12-24 23:28 /user/hive/warehouse/word
abc和word是Hive创建的两个表的表名
2.
hadoop@tom-Inspiron-3521:~/hadoop-2.5.2/bin$ hdfs dfs -ls /user/hive/warehouse/wordFound 1 items-rw-r--r-- 1 hadoop supergroup 20 2014-12-24 23:28 /user/hive/warehouse/word/word.txt
word表下的word.txt是之前的步骤上传的文件,
3.
hadoop@tom-Inspiron-3521:~/hadoop-2.5.2/bin$ hdfs dfs -cat /user/hive/warehouse/word/word.txt1MSN10 QQ100Gtalk1000 Skype
hive> insert into table my_word values(10, "WeChat");Query ID = hadoop_20150308231111_f2c753b4-e528-4081-887e-cf310dc76695Total jobs = 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1425868733189_0001, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0001/Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job -kill job_1425868733189_0001Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02015-03-08 23:11:32,345 Stage-1 map = 0%, reduce = 0%2015-03-08 23:11:43,706 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.23 secMapReduce Total cumulative CPU time: 2 seconds 230 msecEnded Job = job_1425868733189_0001Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to: hdfs://hadoop.master:9000/tmp/hive/hadoop/70d6d067-6898-4d12-9190-1431ddb4ff9a/hive_2015-03-08_23-11-15_056_4782682719483814130-1/-ext-10000Loading data to table default.my_wordTable default.my_word stats: [numFiles=2, numRows=1, totalSize=51, rawDataSize=9]MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.23 sec HDFS Read: 288 HDFS Write: 81 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 230 msecOKTime taken: 30.149 secondshive> select * from my_word;OK10WeChat1MSN 10QQ 100Gtalk 1000Skype NULLNULLTime taken: 0.05 seconds, Fetched: 6 row(s)
插入数据时,HDFS上的变化如下:
[hadoop@hadoop bin]$ ./hdfs dfs -ls /user/hive/warehouse/my_wordFound 2 items-rw-r--r-- 2 hadoop supergroup 10 2015-03-08 23:11 /user/hive/warehouse/my_word/000000_0-rw-r--r-- 2 hadoop supergroup 41 2015-03-08 23:09 /user/hive/warehouse/my_word/word.txt[hadoop@hadoop bin]$ ./hdfs dfs -ls cat /user/hive/warehouse/my_word.txtls: `cat': No such file or directoryls: `/user/hive/warehouse/my_word.txt': No such file or directory[hadoop@hadoop bin]$ ./hdfs dfs -cat /user/hive/warehouse/my_word.txtcat: `/user/hive/warehouse/my_word.txt': No such file or directory[hadoop@hadoop bin]$ ./hdfs dfs -cat /user/hive/warehouse/my_word/word.txt1,MSN 10,QQ 100,Gtalk 1000,Skype [hadoop@hadoop bin]$ ./hdfs dfs -cat /user/hive/warehouse/my_word/000000_010,WeChat[hadoop@hadoop bin]$
初始的word.txt没有变化,新增了一个000000_0文件,其中是新增的数据
Hive不支持更新操作,delete和update,只能通过其它方式,迂回的完成
hive> delete from my_word where id = 100;FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.hive> insert overwrite table my_word select * from my_word where id != 100;Query ID = hadoop_20150308232020_d8e7491c-006e-4377-8962-8a01dc651a82Total jobs = 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1425868733189_0002, Tracking URL = http://hadoop.master:8088/proxy/application_1425868733189_0002/Kill Command = /home/hadoop/software/hadoop-2.5.2/bin/hadoop job -kill job_1425868733189_0002Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02015-03-08 23:20:34,938 Stage-1 map = 0%, reduce = 0%2015-03-08 23:20:53,232 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.55 secMapReduce Total cumulative CPU time: 2 seconds 550 msecEnded Job = job_1425868733189_0002Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to: hdfs://hadoop.master:9000/tmp/hive/hadoop/70d6d067-6898-4d12-9190-1431ddb4ff9a/hive_2015-03-08_23-20-20_635_4212544169044890358-1/-ext-10000Loading data to table default.my_wordTable default.my_word stats: [numFiles=1, numRows=4, totalSize=38, rawDataSize=34]MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.55 sec HDFS Read: 346 HDFS Write: 109 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 550 msecOKTime taken: 36.279 secondshive> select * from my_word;OK10WeChat1MSN 10QQ 1000Skype Time taken: 0.086 seconds, Fetched: 4 row(s)hive>
此时查看HDFS的状态
[hadoop@hadoop bin]$ ./hdfs dfs -ls /user/hive/warehouse/my_word/Found 1 items-rw-r--r-- 2 hadoop supergroup 38 2015-03-08 23:20 /user/hive/warehouse/my_word/000000_0[hadoop@hadoop bin]$ ./hdfs dfs -cat /user/hive/warehouse/my_word/000000_010,WeChat1,MSN 10,QQ 1000,Skype [hadoop@hadoop bin]$
可见,最初的word.txt已经被覆盖掉,只有000000.0文件,其中不包含id为100的那个记录
a
请勿发布不友善或者负能量的内容。与人为善,比聪明更重要!
技术博客集 - 网站简介:
前后端技术:
后端基于Hyperf2.1框架开发,前端使用Bootstrap可视化布局系统生成
网站主要作用:
1.编程技术分享及讨论交流,内置聊天系统;
2.测试交流框架问题,比如:Hyperf、Laravel、TP、beego;
3.本站数据是基于大数据采集等爬虫技术为基础助力分享知识,如有侵权请发邮件到站长邮箱,站长会尽快处理;
4.站长邮箱:[email protected];
文章归档
文章标签
友情链接