I mentioned in my article that , It's possible to load some columns of a file . Two , You only have one column , That's all right. .

however , Two documents ,f1 and f2,f1 Yes 42 Column ,f2 Yes 43 Column , Load into a stream object at the same time , how ?

answer : Successfully loaded . But no structure (schema unknown),discribe See later :Schema for origin_cleaned_data unknown.

This situation is similar to union, Merge two objects with different columns , An unknown schema object is generated .

background : Because the old log 42 Column , Add one more column to the new log at 20 Column , because 20 Column cannot be followed by the same name , The total number of user clicks in the log . So load together , Unified statistics .

( If you know the type of log for different dates , You can read in , Specify a clear pattern , And then use onschema Conduct uion, In separate statistics . It's a pity to accept the project , I'm not sure when I changed it online )

sampling : Old journal log_without.txt, New log log_with_android_ad_id.txt

The code is as follows

REGISTER piggybank.jar;

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();



%default cleanedLog /user/wizad/tmp/log_*



--%default cleanedLog1 /home/wizad/lmj/log_without.txt

--%default cleanedLog 2/home/wizad/lmj/log_with_android_ad_id.txt



origin_cleaned_data = LOAD '$cleanedLog' USING PigStorage(','); 



DUMP origin_cleaned_data;



DESCRIBE origin_cleaned_data;

Show results :

((null) 5,74,48809e40-b8d7-41a4-bf68-d0f8e28140ad,575356365101899146,2014-07-30 10:33:56,2014-07-30 10:33:56,1,57074,2,,,,,,,,151.87.202.1,1,-1,-1,lmj,-1,1ac2c73e-d93a-4801-a7ee-da05473d0585,48809e40-b8d7-41a4-bf68-d0f8e28140ad,02:00:00:00:00:00,1940064625594046032,,,,d70cc494,25100,206,,0,2,2,7.1,,,,42.833298,12.833298,120232210032202)

((null) 5,74,357633052513139,1033882907630785616,2014-07-30 11:15:05,2014-07-30 11:15:05,1,57074,2,,,,,,,,155.128.32.119,1,357633052513139,270f213575a4eda7,lmj,270f213575a4eda7,,,40:0e:85:40:0e:1a,-7537294162085162169,,,,7626e397,62713,206,,2,1,3,4.3,,,,37.774902,-122.4194,023010203333003)

((null) 5,74,e7a4afce-ffd9-4ecd-b916-39f9d793c218,207640323432175503,2014-07-30 10:29:22,2014-07-30 10:29:22,1,57074,2,,,,,,,,111.200.142.163,1,-1,-1,lmj,-1,14ea5e95237f34e278d7ac210173d6b8ad9d5026,e7a4afce-ffd9-4ecd-b916-39f9d793c218,02:00:00:00:00:00,1179719885610920154,,,,d4eeab6e,66104,101,,0,2,2,7.1,1,7,7,39.928894,116.388306,132100103322203)

((null) 5,74,48809e40-b8d7-41a4-bf68-d0f8e28140ad,575356365101899146,2014-07-30 10:33:56,2014-07-30 10:33:56,1,57074,2,,,,,,,,151.87.202.1,1,-1,-1,-1,1ac2c73e-d93a-4801-a7ee-da05473d0585,48809e40-b8d7-41a4-bf68-d0f8e28140ad,02:00:00:00:00:00,1940064625594046032,,,,d70cc494,25100,206,,0,2,2,7.1,,,,42.833298,12.833298,120232210032202)

((null) 5,74,302bd8f1-b974-4af5-8183-1f67d27410d6,367366268601246781,2014-07-30 10:07:57,2014-07-30 10:07:57,1,57074,2,,,,,,,,56.2.255.220,1,-1,-1,-1,c165376f9f76cf68862a505328b7ba7cd0cfa0b0,302bd8f1-b974-4af5-8183-1f67d27410d6,02:00:00:00:00:00,-488564527359896578,,,,103b14d3,25100,206,,0,2,2,7.1,,,,37.774902,-122.4194,023010203333003)

((null) 5,74,e7a4afce-ffd9-4ecd-b916-39f9d793c218,207640323432175503,2014-07-30 10:29:22,2014-07-30 10:29:22,1,57074,2,,,,,,,,111.200.142.163,1,-1,-1,-1,14ea5e95237f34e278d7ac210173d6b8ad9d5026,e7a4afce-ffd9-4ecd-b916-39f9d793c218,02:00:00:00:00:00,1179719885610920154,,,,d4eeab6e,66104,101,,0,2,2,7.1,1,7,7,39.928894,116.388306,132100103322203)

Schema for origin_cleaned_data unknown.

One more column, the value is lmj The column of . You can see no structure .

union: Merge columns of different formats

(union Don't repeat it )

A = load 'input1' as (x:int, y:float);
B = load 'input2' as (x:int, y:chararray);
C = union A, B;
describe C;
 Show results :
Schema for C unknown

Two variables without column names union use onschema

We need to pay attention to : Use onschema, All input needs to be clear schema, otherwise , error . because union When merging , The comparison is by name and column type ( Can automatically convert from low level to high level ).

After the merger , The empty column will make up for null.

A = load 'input1' as (w: chararray, x:int, y:float);
B = load 'input2' as (x:int, y:double, z:chararray);
C = union onschema A, B;
describe C;
result :
C: {w: chararray,x: int,y: double,z: chararray}

Give a not union Code example of

%default cleanedLog1 /home/wizad/lmj/log_without.txt

%default cleanedLog2 /home/wizad/lmj/log_with_android_ad_id.txt

origin1 = LOAD '$cleanedLog1' USING PigStorage(','); 

origin2 = LOAD '$cleanedLog2' USING PigStorage(',');

DESCRIBE origin1

DESCRIBE origin2



origin = union origin1,origin2

result :

origin1 and origin2 Show Schema for origin2 unknown.

therefore origin Can't generate

pig Load two files with different number of fields ?load file with different items(f1 Yes 42 Column ,f2 Yes 43 Columns read into an object ) More articles about

  1. EasyUI Use tree Method to generate the tree structure and load it twice

    html Use... In your code class The statement easyui-tree, Lead to easyUI analysis class Parse the code first class Statement easyui-tree So the component requests once url: And then call js Initialization code request ...

  2. ListView Load more than two different layouts

    Different project layouts (item layout) Listview A single item Sometimes the layout can't fully meet the business needs , We need to load two or more different layouts , The implementation is very simple : rewrite getViewTypeCou ...

  3. 【 Practical problems 】【1】@PostConstruct The problem of loading the service twice after starting

    @PostConstruct: Trigger action when service starts ( I use it to update wechat access_token) resolvent : tomcat Folder →conf→server.xml→ take appBase="weba ...

  4. IE Browser IFrame Be loaded twice to solve the problem -sunziren

    This paper is the author sunziren original , The first blog Park , Reprint please indicate the source . There was a problem yesterday , On the first code . var content = '<iframe src="www.baidu.com&quo ...

  5. Cordova Failed to load Internet pictures on the page ,Refused to load the image

    original text :Cordova Failed to load Internet pictures on the page ,Refused to load the image 1. Use Cordova Failed to load Internet pictures on the page , Throw an exception Refused to load the image ...

  6. iOS Development —— The Internet Swift piece &amp;NSURLSession Load data 、 download 、 Upload files

    NSURLSession Load data . download . Upload files   NSURLSession Class supports three types of tasks : Load data . Download and upload . Here are some examples .   1, Use Data Task Load data Use global ...

  7. webpack When loading multi-level dependencies css、html The file is not correct resolve The problem of

    In the use of webpack+avalon as well as avalon Of mmRouter do SPA When , A problem that's been bothering me for weeks :webpack An error occurred while loading multi-level dependencies css Documents and templates (html) The file is not correct resolve. primary ...

  8. AIRSDK 3.7 Load remote with code swf file

    I said that this version will solve the problem of loading remote code swf The need for documents . however , I've been wondering if it works , I thought Adobe The vice president went to apple , Special treatment was given internally . Because Apple doesn't allow code to be loaded remotely , image js writing ...

  9. css3 Load two animations at the same time

    Recently doing H5, In the face of such a demand ( As the title ) First part of the code : .cur .p1d1d4{   width: 3rem;   margin: 2rem 5.3rem 0 0;   -webkit-animat ...

Random recommendation

  1. ASP.NET MVC Autofac Automatic injection

    The dependency injection container has many plug-ins , I used Unity and Autofac, The most obvious feeling of these two plug-ins is Autofac Soon , Very fast , After all, it was developed by a third party , and Unity Relatively speaking, the performance is relatively stable Attached below Autofac Automatic annotation ...

  2. SharePoint 2013 Extend the lookup function

    SharePoint 2013 The lookup function of , You can expand other list fields to the current list options , But when there are too many options , It's very difficult to choose , therefore , We take JS+Ajax The way , Improve the presentation , Make the operation more convenient . ...

  3. SQL sentence Restore unknown logical name database

    1. see SQL Server 2000 in Northwind The logical file name of the database file (logical file name) And physical file paths (operation system file name): ...

  4. html5 The course of development and the political struggle caused by it

    2007 year , Jobs asserted that he refused Flash And predict HTML5 Times are coming ,IT The industry is right HTML5 There was a religious passion .HTML5 There are many outstanding features , You can draw directly on a web page . Embedded audio and video . Realize information interaction , Can span iOS.A ...

  5. MySQL Notes on indexing

    For big data tables , Especially for tables with more than a million rows , Be sure to index it , Otherwise, the query speed is very slow .( Refer to the following test results ) Note when building an index : MySQL There are two kinds of indexes for : Single index ( That is, index a column ). Multi column composite index ( That is to say ...

  6. Oracle EBS-SQL (INV-1): Check that the material cost is 0 And the material status is not &#39;NEW&#39; Materials .sql

    select --msi.inventory_item_id        --,msi.organization_id     msi.segment1                        ...

  7. hdu2612(bfs)

    Topic link :http://acm.hdu.edu.cn/showproblem.php?pid=2612 The question : seek 2 One point to any one KFC The sum of the distances , Make it the smallest . analysis : Starting from two points, two times respectively bfs, Get every ...

  8. hdu4453 Looploop 2012 Hangzhou live game in 2007 Splay

    The question : Maintain a circle , Realize six functions , Add a value to a number from a position , Invert a certain number of segments , Add or delete some numbers , Move the current position , ordinary splay, Break the circle into a chain , For each operation , With the . #define inf 0x3f ...

  9. Broadcast Reveiver effect

    Broadcast Reveiver Function and why to introduce ( Used to receive system and custom messages ) Inform and determine the execution status in the system 1, System execution status , It's on ,TF Plug and unplug the card , Ready to shut down , The battery is low , 2, Custom execution state , FA Xiao ...

  10. java There are many ways to realize asynchronous query to synchronous query : Loop waiting for ,CountDownLatch,Spring EventListener, Timeout processing and empty loop performance optimization

    Asynchronous to synchronous Business needs Some interface query feedback results are returned asynchronously , Unable to get query results immediately . Normal processing logic Trigger asynchronous operations , And then pass a unique ID . Wait until the asynchronous result returns , Based on the unique ID passed in , Match this result . How to convert to synchronization ...