更新时间:2022-12-18 10:21:22
1.Pig 仅支持几种格式的日期,因此您需要根据以下任一格式转换日期和时间.
示例:
input.txt
英国夏令时 2014 年 10 月 15 日星期三 09:26:092014 年 10 月 15 日星期三 19:26:09 BST2014 年 10 月 18 日星期三 08:26:09 BST2014 年 10 月 23 日星期三 10:26:09 BST2014 年 10 月 5 日星期日 09:26:09 BST2014 年 11 月 20 日星期三 19:26:09 BST
PigScript:
A = LOAD 'input.txt' USING PigStorage(' ') AS(day:chararray,month:chararray,date:chararray,time:chararray,tzone:chararray,year:chararray);B = FOREACH A GENERATE CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(day,', ',date),' ',month),' ',year),' ',time) AS mytime;C = FOREACH B GENERATE ToDate(mytime,'EEE, d MMM yyyy HH:mm:ss','GMT') AS newTime;D = FOREACH C GENERATE GetMonth(newTime),GetDay(newTime),GetYear(newTime),GetHour(newTime),GetMinute(newTime);转储 D;
输出:
(10,15,2014,9,26)(10,15,2014,19,26)(10,15,2014,8,26)(10、22、2014、10、26)(10,5,2014,9,26)(11,19,2014,19,26)
I have the following human readable date formats stored in a text file:
Wed Oct 15 09:26:09 BST 2014
Wed Oct 15 19:26:09 BST 2014
Wed Oct 18 08:26:09 BST 2014
Wed Oct 23 10:26:09 BST 2014
Sun Oct 05 09:26:09 BST 2014
Wed Nov 20 19:26:09 BST 2014
How can I convert the dates using so they are compatible with Pig's ToDate() function where I can then use GetHour(), GetYear(), GetDay() and GetMonth() to apply date range constraints and logic to my queries?
1.Pig support only few formats of date, so you need to convert your date and time according to any one of the below format.
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
2.Your input has BST as timezone but in pig BST is not supported, so you need to choose a different timezone which is equivalent to BST.
Timezones are available here http://joda-time.sourceforge.net/timezones.html
Examples:
input.txt
Wed Oct 15 09:26:09 BST 2014
Wed Oct 15 19:26:09 BST 2014
Wed Oct 18 08:26:09 BST 2014
Wed Oct 23 10:26:09 BST 2014
Sun Oct 05 09:26:09 BST 2014
Wed Nov 20 19:26:09 BST 2014
PigScript:
A = LOAD 'input.txt' USING PigStorage(' ') AS(day:chararray,month:chararray,date:chararray,time:chararray,tzone:chararray,year:chararray);
B = FOREACH A GENERATE CONCAT(CONCAT(CONCAT(CONCAT(day,', ',date),' ',month),' ',year),' ',time) AS mytime;
C = FOREACH B GENERATE ToDate(mytime,'EEE, d MMM yyyy HH:mm:ss','GMT') AS newTime;
D = FOREACH C GENERATE GetMonth(newTime),GetDay(newTime),GetYear(newTime),GetHour(newTime),GetMinute(newTime);
DUMP D;
Output:
(10,15,2014,9,26)
(10,15,2014,19,26)
(10,15,2014,8,26)
(10,22,2014,10,26)
(10,5,2014,9,26)
(11,19,2014,19,26)