且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

人类可读的字符串日期使用 Pig 转换为日期?

更新时间:2022-12-18 10:21:22

1.Pig 仅支持几种格式的日期,因此您需要根据以下任一格式转换日期和时间.

示例:

  1. 我选择的时间格式为EEE, d MMM yyyy HH:mm:ss Z"Wed, 4 Jul 2001 12:08:56",因为这与您的输入数据有些匹配.
  2. BST 时区不可用,所以我选择了GMT"作为时区,您可以根据需要更改.

input.txt

英国夏令时 2014 年 10 月 15 日星期三 09:26:092014 年 10 月 15 日星期三 19:26:09 BST2014 年 10 月 18 日星期三 08:26:09 BST2014 年 10 月 23 日星期三 10:26:09 BST2014 年 10 月 5 日星期日 09:26:09 BST2014 年 11 月 20 日星期三 19:26:09 BST

PigScript:

A = LOAD 'input.txt' USING PigStorage(' ') AS(day:chararray,month:chararray,date:chararray,time:chararray,tzone:chararray,year:chararray);B = FOREACH A GENERATE CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(day,', ',date),' ',month),' ',year),' ',time) AS mytime;C = FOREACH B GENERATE ToDate(mytime,'EEE, d MMM yyyy HH:mm:ss','GMT') AS newTime;D = FOREACH C GENERATE GetMonth(newTime),GetDay(newTime),GetYear(newTime),GetHour(newTime),GetMinute(newTime);转储 D;

输出:

(10,15,2014,9,26)(10,15,2014,19,26)(10,15,2014,8,26)(10、22、2014、10、26)(10,5,2014,9,26)(11,19,2014,19,26)

I have the following human readable date formats stored in a text file:

Wed Oct 15 09:26:09 BST 2014
Wed Oct 15 19:26:09 BST 2014
Wed Oct 18 08:26:09 BST 2014
Wed Oct 23 10:26:09 BST 2014
Sun Oct 05 09:26:09 BST 2014
Wed Nov 20 19:26:09 BST 2014

How can I convert the dates using so they are compatible with Pig's ToDate() function where I can then use GetHour(), GetYear(), GetDay() and GetMonth() to apply date range constraints and logic to my queries?

1.Pig support only few formats of date, so you need to convert your date and time according to any one of the below format.
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

2.Your input has BST as timezone but in pig BST is not supported, so you need to choose a different timezone which is equivalent to BST.
Timezones are available here http://joda-time.sourceforge.net/timezones.html

Examples:

  1. I chosen time format as "EEE, d MMM yyyy HH:mm:ss Z" Wed, 4 Jul 2001 12:08:56", bcoz this is somewhat matching with your input data.
  2. BST time zone is not available, so i chosen 'GMT' as time zone, you can change according to your need.

input.txt

Wed Oct 15 09:26:09 BST 2014
Wed Oct 15 19:26:09 BST 2014
Wed Oct 18 08:26:09 BST 2014
Wed Oct 23 10:26:09 BST 2014
Sun Oct 05 09:26:09 BST 2014
Wed Nov 20 19:26:09 BST 2014

PigScript:

A = LOAD 'input.txt' USING PigStorage(' ') AS(day:chararray,month:chararray,date:chararray,time:chararray,tzone:chararray,year:chararray);
B = FOREACH A GENERATE CONCAT(CONCAT(CONCAT(CONCAT(day,', ',date),' ',month),' ',year),' ',time) AS mytime;
C = FOREACH B GENERATE ToDate(mytime,'EEE, d MMM yyyy HH:mm:ss','GMT') AS newTime;
D = FOREACH C GENERATE GetMonth(newTime),GetDay(newTime),GetYear(newTime),GetHour(newTime),GetMinute(newTime);
DUMP D;

Output:

(10,15,2014,9,26)
(10,15,2014,19,26)
(10,15,2014,8,26)
(10,22,2014,10,26)
(10,5,2014,9,26)
(11,19,2014,19,26)